Reputation: 123
So I'm trying to read an excel file with C# and the document is 181MB. I have tried using Microsoft.Office.Interop.Excel, OpenXML, ClosedXML, and ExcelDataReader. I wasn't able to get OpenXML to work and ClosedXML seems to have issues with large excel file (it also takes at least 6 minutes to read the file). I like ExcelDataReader the most since I can read the data table like an array but it does take 4-5 minutes to read the file which is much faster than Interlop, but it's still a long wait. I'm considering converting the excel document into a csv file, but when I did that the size went from 181 MB to 248 MB so I'm unsure if it will be more efficient. It also forces the users to do an extra step to convert their files into a csv, but if the performance is worth it I will attempt this route.
Unfortunately, I am not able to pre-determine how many columns and rows the excel document will have as the users will be using openFileDialog to select a file.
Is ExcelDataReader the best way to go or is there a better solution?
Here's my current code in case there's some improvements I can make:
OpenFileDialog openFileDialog = new OpenFileDialog();
openFileDialog.Filter = "Excel Files|*.xls;*.xlsx;*.slxm";
if (openFileDialog.ShowDialog() == true)
{
using (var stream = File.Open(openFileDialog.FileName, FileMode.Open, FileAccess.Read))
{
using (var reader = ExcelReaderFactory.CreateReader(stream))
{
//results will be in dataSet.Tables
var dataSet = reader.AsDataSet();
var dataTable = dataSet.Tables[0];
int r = 0;
for(int c = 0; c < dataTable.Columns.Count; c += 3)
{
TagListData.Add(new TagClass { IsTagSelected = false, TagName = dataTable.Rows[r][c].ToString(), rIndex = r, cIndex = c });
}
}
}
GC.Collect();
GC.WaitForPendingFinalizers();
GC.Collect();
}
Upvotes: 0
Views: 2386
Reputation: 676
Idea 1: There is some overhead with ExcelDataReader's AsDataSet - so it's a good idea to use the reader interface directly when working with large sheets. It implements the IDataReader interface and provides pr-row level access to the data:
using (var reader = ExcelReaderFactory.CreateReader(stream)) {
reader.Read();
for(int c = 0; c < reader.FieldCount; c += 3) {
TagListData.Add(new TagClass { IsTagSelected = false, TagName = Convert.ToString(reader.GetValue(c)), rIndex = r, cIndex = c });
}
}
Idea 2: Try to pass ExcelDataSetConfiguration.UseColumnDataType = false
to AsDataSet, this eliminates an internal pass and reduces memory pressure, so should improve performance noticably with large sheets
Upvotes: 1