Shazia Tabassum
Shazia Tabassum

Reputation: 135

EPPlus - How to Handle Rows beyond 1,048,576

I am generating transaction reports which can contain a large amount of data. But the issue is that excel only support the max rows of 1,048,576.

I am using EPPlus v6.2.3. It is crashing with the error Index Out of Bound on large data. I want to add new sheet in the workbook if data exceeds the max row count. How can I do that?

Below is my code snippet:

  public static ExcelPackage ExportToExcel(ExcelPackage excelPackage, DataTable dataTable, ExcelCellAddress startCellAddress)
            {
                //Create the worksheet
                ExcelWorksheet ws = excelPackage.Workbook.Worksheets[0];
    
                ws.Cells[startCellAddress.Address].LoadFromDataTable(dataTable, false);
                // Calculate the final column letter
                string finalColLetter = string.Empty;
                string colCharset = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
                int colCharsetLen = colCharset.Length;
    
                if (dataTable.Columns.Count > colCharsetLen)
                {
                    finalColLetter = colCharset.Substring((dataTable.Columns.Count - 1) / colCharsetLen - 1, 1);
                }
    
                finalColLetter += colCharset.Substring((dataTable.Columns.Count - 1) % colCharsetLen, 1);
                string cellRange = string.Format("{0}:{1}{2}", startCellAddress.Address, finalColLetter, startCellAddress.Row + dataTable.Rows.Count - 1);
    
                ExcelRange range = ws.Cells[cellRange];
                // Assign borders
                SetNormalCellStyle(range);
    
                return excelPackage;
            }

Upvotes: 1

Views: 1209

Answers (3)

Shazia Tabassum
Shazia Tabassum

Reputation: 135

Finally I have achieved to add worksheet as the data exceeds the max no. of allowed rows in excel, using EPPlus and without looping on the large dataset.

Here is the sample code:

//Create the worksheet
            ExcelWorksheet ws = null;
            int prevAddedRows = 0;
            int excelMaxRows = 1048576;
            int totalRows = dataTable.Rows.Count;
            int maxCount = excelMaxRows;
            int startIndex = 0;
            int remainingRowsCount;
            ExcelCellAddress startCell = startCellAddress;

            int totalSheets = (totalRows / excelMaxRows) + 1;
            for (int i = 0; i < totalSheets; i++)
            {
                if (i == 0)
                    ws = excelPackage.Workbook.Worksheets[i];
                else
                {
                    ws = excelPackage.Workbook.Worksheets.Add(string.Format("Sheet{0}", i + 1));
                    startCell = new ExcelCellAddress("A1");
                }
                DataTable selectedDataTable = dataTable.AsEnumerable().Skip(startIndex).Take(maxCount).CopyToDataTable();
                ws.Cells[startCell.Address].LoadFromDataTable(selectedDataTable, false);
                prevAddedRows += maxCount;
                remainingRowsCount = totalRows - prevAddedRows;
                if (remainingRowsCount < excelMaxRows)
                {
                    maxCount = remainingRowsCount;
                }
                startIndex = prevAddedRows;              
            }
            return excelPackage;
        }

Upvotes: 1

Barreto
Barreto

Reputation: 432

You can divide you data from multiple sheets by creating a sheet in a loop

Whenever the data reaches your threshold of 1,048,576 start a new sheet

Without sample code on your question I took the liberty to craft a simple example that should work regardless of the package or version:

using OfficeOpenXml;
(...)

int worksheetIndex = 0, 
   row = 1, 
   dataPerWorksheet = 1_000_000,
   dataItems = 3_000_000;

ExcelWorksheet currentWorksheet = null!;

for (int i = 0; i < dataItems; i++)
{
    //check if you need a new sheet
    if (i % dataPerWorksheet == 0)
    {
        worksheetIndex++;
        currentWorksheet = package.Workbook.Worksheets.Add($"Data{worksheetIndex}");
        row = 1;
    }
    
    //Add data to the row on current sheet
    currentWorksheet.Cells[row, 1].Value = i; //I used i as the data
    row++;
}

With the addition of boiler plate this gave me one excel file with 3 worksheets filled with continuous count from 0 to 3_000_000, each sheet holding 1_000_000 rows of data.

I am working with OpenOffice but this should work with little to none modifications on Excel

Extra:

Regarding your edit, you can do the same with your example by modifying the above example to get data from DataTable

For example:

int worksheetIndex = 0,
    row = startCellAddress.Row,
    col = startCellAddress.Column
    dataPerWorksheet = 1_000_000;

//Declare worksheet
ExcelWorksheet ws = null;

for (int r = 0; r < dataTable.Rows.Count; r++)
{
    if (r % dataPerWorksheet == 0)
    {
        worksheetIndex++;
        ws = excelPackage.Workbook.Worksheets.Add($"Data{worksheetIndex}");
        row = startCellAddress.Row;
    }

    ws.Cells[Row:row, Col:col].Value = dataTable.Rows[r]["A"];
    ws.Cells[Row: row, Col: col+1].Value = dataTable.Rows[r]["B"];

    row++;
}

Regarding the comment: you need to loop over each row to copy the data from one row to the other. The method LoadFromDataTable() probably loops over all rows as well, just not on your side of the code.

Regarding performance I am not sure if there are better ways to do this, I am not an expert, but it took much longer to open the file than to create it.

Result

//rows in data= 1_500_000
//dataPerWorksheet = 500_000
ExportToExcel(package, data, new("B2"));

result

Your method with the multiple worksheets modification took 08s15ms and the file has 19,4 MB (20 348 432 bytes) with my dataset.

Upvotes: 1

MarkPflug
MarkPflug

Reputation: 29578

If you are willing to consider an alternative to EPPlus, I maintain an Excel library that might work for you: Sylvan.Data.Excel.

Here is a complete C# 11 example showing how to write data that exceeds the limit of a single sheet:

using Sylvan.Data;
using Sylvan.Data.Excel;
using System.Diagnostics;

var sw = Stopwatch.StartNew();

// synthesize some data, and expose it as a DbDataReader
var data = 
    Enumerable.Range(0, 2_000_000)
    .Select(i => new { Id = i, Date = DateTime.Today.AddHours(i), Value = Math.PI * i });
System.Data.Common.DbDataReader dr = data.AsDataReader();

// create an ExcelDataWriter, which can also write .xlsb files
using var edw = ExcelDataWriter.Create("report.xlsx");

bool done = false;
int page = 1;
do
{
    // write the data to the excel file
    var result = edw.Write(dr, "Report_" + page++);
    // result will identify the number of rows written
    // and IsComplete indicating if the entire DbDataReader was consumed.
    done = result.IsComplete;
} while (!done);

sw.Stop();
var size = new FileInfo("report.xlsx").Length;
Console.WriteLine($"Created {size} file in {sw.Elapsed}.");

On my machine, this produces a ~30MB file in about 3.8 seconds.

Created 30313078 file in 00:00:03.8263647.

Upvotes: 0

Related Questions