| elements.
Transformer: Translates inline CSS/HTML alignment, backgrounds, and fonts into native Excel OpenXML styles.
Writer Layer: Serializes data immediately into compressed OpenXML zipped packages using low-overhead writers. 📦 Key Dependencies to Choose From
Do not write an HTML parser or zip serializer from scratch. Mix and match these highly optimized libraries: HTML Parsing
AngleSharp: Fast, fully HTML5-compliant, and builds a very efficient DOM tree.
HtmlAgilityPack: The industry standard for rapid XPath-based extraction of massive tables. High-Performance Excel Generation
LargeXlsx: An ultra-fast, stream-only XLSX writer designed explicitly for low memory allocation.
FastExcel: Focuses on raw performance and minimal memory usage.
EPPlus or ClosedXML: Great feature support, though they hold the workbook in memory (use for medium datasets). 🏎️ Core Performance Tactics 1. Avoid Managed DOM Trees for Huge Files
Loading massive HTML into a complete XmlDocument or HtmlDocument will cause OutOfMemory exceptions on large datasets. Use HtmlReader or a forward-only tokenization stream to read nodes sequentially without keeping the entire tree in memory. 2. Prevent Boxing and String Duplication
When parsing cell text, avoid calling .ToString() continuously on matching elements. Pass string references directly to your Excel stream writer to protect the .NET Garbage Collector (GC) from high allocation churn. 3. Reuse Styles (The Style Dictionary)
Excel manages formatting through a shared stylesheet. Creating a style object for every single cell will corrupt the file structure and severely degrade rendering performance. Map your incoming HTML styles into a shared hash table:
// Use a lookup dictionary to reuse style index references var styleMap = new Dictionary(); Use code with caution. 💻 Code Blueprint (High Performance)
This example combines HtmlAgilityPack for targeting the tables with LargeXlsx to stream the Excel data directly to disk.
using System; using System.IO; using System.Collections.Generic; using HtmlAgilityPack; using LargeXlsx; public class HighPerfHtmlToExcelConverter { public static void Convert(string htmlFilePath, string excelFilePath) { // 1. Initialize the HTML parser var doc = new HtmlDocument(); doc.Load(htmlFilePath); // For true streaming, parse nodes chunk-by-chunk // 2. Select tables using optimized XPaths var tables = doc.DocumentNode.SelectNodes(“//table”); if (tables == null) return; // 3. Initialize the stream-based Excel writer using var stream = new FileStream(excelFilePath, FileMode.Create, FileAccess.Write, FileShare.None); using var xlsxWriter = new XlsxWriter(stream); // Predefine structural styles to avoid instantiation inside loops var headerStyle = new XlsxStyle( XlsxFont.Default.WithBold(), XlsxFill.FromHtml(“#4CAF50”), // Green header background XlsxBorder.None, XlsxAlignment.Default ); int tableIndex = 1; foreach (var table in tables) { xlsxWriter.BeginWorksheet($“Table {tableIndex++}”); // Extract rows sequentially var rows = table.SelectNodes(“.//tr”); if (rows == null) continue; foreach (var row in rows) { xlsxWriter.BeginRow(); // Handle headers and data cells uniformly var cells = row.SelectNodes(“./th|./td”); if (cells == null) continue; foreach (var cell in cells) { string text = cell.InnerText.Trim(); bool isHeader = cell.Name == “th”; // Smart-cast numeric data types to prevent text formatting alerts in Excel if (!isHeader && double.TryParse(text, out double numericValue)) { xlsxWriter.Write(numericValue); } else { xlsxWriter.Write(text, isHeader ? headerStyle : XlsxStyle.Default); } } } } } } Use code with caution. 🎨 Mapping CSS Styles to OpenXML
To make your converter fully functional, translate common HTML elements into OpenXML formatting rules: HTML / CSS Property Excel OpenXML Equivalent Implementation Strategy colspan=“X” / rowspan=“Y” Merged Cells
Keep track of coordinates; call SkipCells() or MergeCells() in your writer layer. background-color: #HEX XlsxFill.FromHtml()
Parse hex color arrays and register them in the stylesheet dictionary. text-align: center XlsxAlignment.Centered
Read inline styles and apply corresponding horizontal alignments. or font-weight: bold XlsxFont.WithBold() Toggle bold boolean flags on the active font style. 🚀 Production Optimization Checklist
Turn off Automatic Type Conversions: If your HTML file only contains simple strings, disable automated date/number detection routines to increase throughput speeds.
Buffer Input Streams: When loading raw content over network connections or sluggish drives, wrap the streams inside a BufferedStream with an 8192-byte config.
Process via IJobQueue: Wrap the utility inside background processing systems like Hangfire or .NET BackgroundService classes to safeguard web server threads against spikes during massive bulk downloads.
To help tailor this implementation, what is the average size of the HTML files you need to convert, and do you require support for complex styling like merged cells or embedded links?
Easily Convert Excel to HTML in 3 Steps With C# – Syncfusion
This blog explains how to easily convert Excel documents into HTML using the Syncfusion .NET Excel Library in just three steps. www.syncfusion.com Export HTML table to Excel file using C# (WinForms .NET 5)
|
Leave a Reply