Reputation: 26507
What is the most efficient way to enumerate every cell in every sheet in a workbook?
The method below seems to work reasonably for a workbook with ~130,000 cells. On my machine it took ~26 seconds to open the file and ~5 seconds to enumerate the cells . However I'm no Excel expert and wanted to validate this code snippet with the wider community.
DateTime timer = DateTime.Now;
Microsoft.Office.Interop.Excel.Application excelApplication = new Microsoft.Office.Interop.Excel.Application();
try
{
exampleFile = new FileInfo(Path.Combine(System.Environment.CurrentDirectory, "Large.xlsx"));
excelApplication.Workbooks.Open(exampleFile.FullName, false, false, missing, missing, missing, true, missing, missing, true, missing, missing, missing, missing, missing);
Console.WriteLine(string.Format("Took {0} seconds to open file", (DateTime.Now - timer).Seconds.ToString()));
timer = DateTime.Now;
foreach(Workbook workbook in excelApplication.Workbooks)
{
foreach(Worksheet sheet in workbook.Sheets)
{
int i = 0, iRowMax, iColMax;
string data = String.Empty;
Object[,] rangeData = (System.Object[,]) sheet.UsedRange.Cells.get_Value(missing);
if (rangeData != null)
{
iRowMax = rangeData.GetUpperBound(0);
iColMax = rangeData.GetUpperBound(1);
for (int iRow = 1; iRow < iRowMax; iRow++)
{
for(int iCol = 1; iCol < iColMax; iCol++)
{
data = rangeData[iRow, iCol] != null ? rangeData[iRow, iCol].ToString() : string.Empty;
if (i % 100 == 0)
{
Console.WriteLine(String.Format("Processed {0} cells.", i));
}
i++;
}
}
}
}
workbook.Close(false, missing, missing);
}
Console.WriteLine(string.Format("Took {0} seconds to parse file", (DateTime.Now - timer).Seconds.ToString()));
}
finally
{
excelApplication.Workbooks.Close();
excelApplication.Quit();
}
Edit:
Worth stating that I want to use PIA and interop in order to access properties of excel workbooks that are not exposed by API's that work directly with the Excel file.
Upvotes: 2
Views: 11762
Reputation: 36300
There is an open source implementation of an Excel reader and writer called Koogra. It allows you to read in the excel file and modify it using pure managed code. This would probably be much faster than the code you are using now.
Upvotes: 2
Reputation: 26632
I think, this is the most efficient way, how do it with PIA. Maybe will littlebit faster using "foreach" insted of "for", but it will not dramatic change.
If is efficiency your primary goal, you should work with excel files directly - without excel application.
Upvotes: 1
Reputation: 121294
Excel PIA Interop is really slow when you are doing things cell by cell.
You should select the range you want to extract, like you did with the Worksheet.UsedRange
property and then read the value of the whole range in one step, by invoking get_Value()
(or just simply by reading the Value
or Value2
property, I can't remember which one) on it.
This will yield an object[,]
, that is, a two dimensional array, which can be easily enumerated and is quick to be read.
EDIT: I just read your actual code and realized that it actually does what I proposed. Shame on me for not reading the question properly before answering. In that case, you cannot make it much faster. Excel PIA Interop is slow. If you need a quicker solution you will have to either migrate jExcelApi from Java to C# (not a terribly hard thing to do) or use some commercial component. I suggest to avoid the OLEDB interface at all costs, in order to keep your sanity.
Unrelated, but helpful tip: You should use the ?? operator. It is really handy. Instead of
data = rangeData[iRow, iCol] != null ? rangeData[iRow, iCol].ToString() : string.Empty;
you could just write
data = Convert.ToString(rangeData[iRow, iCol]) ?? string.Empty;
In that case, even String.Empty is not necessary since Convert.ToString(object) converts null
to an empty string anyway.
Upvotes: 2