ManInMoon
ManInMoon

Reputation: 7005

How to read an Excel spreadsheet in c# quickly

I am using Microsoft.Office.Interop.Excel to read a spreadsheet that is open in memory.

gXlWs = (Microsoft.Office.Interop.Excel.Worksheet)gXlApp.ActiveWorkbook.ActiveSheet;
int NumCols = 7;
string[] Fields = new string[NumCols];
string input = null;
int NumRow = 2;
while (Convert.ToString(((Microsoft.Office.Interop.Excel.Range)gXlWs.Cells[NumRow, 1]).Value2) != null)
{
    for (int c = 1; c <= NumCols; c++)
    {
        Fields[c-1] = Convert.ToString(((Microsoft.Office.Interop.Excel.Range)gXlWs.Cells[NumRow, c]).Value2);
    }
    NumRow++;

    //Do my other processing
}

I have 180,000 rows and this turns out be very slow. I am not sure the "Convert" is efficient. Is there anyway I could do this faster?

Moon

Upvotes: 10

Views: 41410

Answers (6)

I found really fast way to read excel in my specific way. I need to get it as a two dimensional array of string. With really big excel, it took about one hour in old way. In this way, I get my values in 20sec.

I am using this nugget: https://reposhub.com/dotnet/office/ExcelDataReader-ExcelDataReader.html

And here is my code:

DataSet result = null;
//https://reposhub.com/dotnet/office/ExcelDataReader-ExcelDataReader.html
using (var stream = File.Open(path, FileMode.Open, FileAccess.Read))
{
    // Auto-detect format, supports:
    //  - Binary Excel files (2.0-2003 format; *.xls)
    //  - OpenXml Excel files (2007 format; *.xlsx)
    using (var reader = ExcelReaderFactory.CreateReader(stream))
    {
        result = reader.AsDataSet();
    }
}

foreach (DataTable table in result.Tables)
{
    if (//my conditions)
    {
        continue;
    }

    var rows = table.AsEnumerable().ToArray();

    var dataTable = new string[table.Rows.Count][];//[table.Rows[0].ItemArray.Length];
    Parallel.For(0, rows.Length, new ParallelOptions { MaxDegreeOfParallelism = 8 },
        i =>
        {
            var row = rows[i];
            dataTable[i] = row.ItemArray.Select(x => x.ToString()).ToArray();                                    
        });

    importedList.Add(dataTable);
}

Upvotes: 1

Manvir Randhawa
Manvir Randhawa

Reputation: 1

Use the OleDB Method. That is the fastest as follows;

string con =
  @"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=D:\temp\test.xls;" + 
  @"Extended Properties='Excel 8.0;HDR=Yes;'";    
using(OleDbConnection connection = new OleDbConnection(con))
{
    connection.Open();
    OleDbCommand command = new OleDbCommand("select * from [Sheet1$]", connection); 
    using(OleDbDataReader dr = command.ExecuteReader())
    {
         while(dr.Read())
         {
             var row1Col0 = dr[0];
             Console.WriteLine(row1Col0);
         }
    }
}

Upvotes: 0

ManInMoon
ManInMoon

Reputation: 7005

Hi I found a very much faster way.

It is better to read the entire data in one go using "get_range". This loads the data into memory and I can loop through that like a normal array.

Microsoft.Office.Interop.Excel.Range range = gXlWs.get_Range("A1", "F188000");
object[,] values = (object[,])range.Value2;
int NumRow=1;
while (NumRow < values.GetLength(0))
{
    for (int c = 1; c <= NumCols; c++)
    {
        Fields[c - 1] = Convert.ToString(values[NumRow, c]);
    }
    NumRow++;
}

Upvotes: 31

Security Hound
Security Hound

Reputation: 2551

I am not sure the "Convert" is efficient. Is there anyway I could do this faster?

What makes you believe this? I promise you that Convert.ToString() is the most effective method in the code you posted. Your problem is that your looping through 180,000 records in an excel document...

You could split the work up since you know the number of row this is trival to do.

Why are you coverting Value2 to a string exactly?

Upvotes: 1

masaishi
masaishi

Reputation: 21

I guess it's not the Convert the source of "slowing"...

Actually, retrieving cell values is very slow.

I think this conversion is not necessary:

(Microsoft.Office.Interop.Excel.Range)gXlWs

It should work without that.

And you can ask directly:

gXlWs.Cells[NumRow, 1].Value != null

Try to move the entire range or, at least, the entire row to an object Matrix and work with it instead of the range itself.

Upvotes: 0

Yahia
Yahia

Reputation: 70369

There are several options - all involve some additional library:

  • OpenXML 2.0 (free library from MS) can be used to read/modify the content of an .xlsx so you can do with it what you want

  • some (commercial) 3rd-party libraries come with grid controls allowing you to do much more with excel files in your application (be it Winforms/WPF/ASP.NET...) like SpreadsheetGear, Aspose.Cells etc.

Upvotes: 4

Related Questions