MHeads
MHeads

Reputation: 387

Read a big Excel document

I want to know what is the fastest way to read cells in Excel. I have an Excel file that contains 50000 rows and I wanna know how to read it fast. I just need to read the first column and with oledb connection it takes me like 15 seconds. Is there a faster way?

Thanks

Upvotes: 11

Views: 13392

Answers (5)

hamzahir
hamzahir

Reputation: 48

I was facing the same thing and i read in the office dev center :

http://social.msdn.microsoft.com/Forums/office/en-US/418ada31-8748-48d2-858b-d177326daa76/export-to-excel-open-xml-sdk-vs-microsoftofficeinteropexcel?forum=oxmlsdk

You have the two choices for manipulating Excel files :

  • Microsoft.Office.Interop.Excel that uses Excel.Application as an added layer for code execution
  • Open XML SDK that allows the developer to work directly with the closed file

there isn't much difference between the two but in your case where the performance is an issue you should use the Open XML SDK that may be a bit faster and don't need that much time opening a large file before processing. as you may read also in the link above and i quote :

Office for automation purposes is not supported. The Office applications were not designed to run without human supervision and have a nasty tendency to "hang"

a good start for learning the open xml sdk is provided in this link : http://msdn.microsoft.com/en-us/library/office/gg575571.aspx

Upvotes: 0

jiverson
jiverson

Reputation: 1244

Here is a method that relies on using Microsoft.Office.Interop.Excel.

Please Note: The Excel file I used had only one column with data with 50,000 entries.

1) Open the file with Excel, save it as csv, and close Excel.

2) Use StreamReader to quickly read the data.

3) Split the data on carriage return line feed and add it to a string list.

4) Delete the csv file I created.

I used System.Diagnostics.StopWatch to time the execution and it took 1.5568 seconds for the function to run.

public static List<string> ExcelReader( string fileLocation )
{                       
    Microsoft.Office.Interop.Excel.Application excel = new Application();
    Microsoft.Office.Interop.Excel.Workbook workBook =
        excel.Workbooks.Open(fileLocation);
    workBook.SaveAs(
        fileLocation + ".csv",
        Microsoft.Office.Interop.Excel.XlFileFormat.xlCSVWindows
    );
    workBook.Close(true);
    excel.Quit();
    List<string> valueList = null;
    using (StreamReader sr = new StreamReader(fileLocation + ".csv")) {
        string content = sr.ReadToEnd();
        valueList = new List<string>(
            content.Split(
                new string[] {"\r\n"},
                StringSplitOptions.RemoveEmptyEntries
            )
        );
    }
    new FileInfo(fileLocation + ".csv").Delete();
    return valueList;
}

Resources:

http://www.codeproject.com/Articles/5123/Opening-and-Navigating-Excel-with-C

How to split strings on carriage return with C#?

Upvotes: 9

Hitesh
Hitesh

Reputation: 3900

Can you put your code for reading 50000 records using OLEDb provider. I have tried doing that, it took 4-5 seconds to read 50000 records with 3 columns. I have done in following way, just have a look, it may help you out. :)

       // txtPath.Text is the path to the excel file
        string conString = @"Provider=Microsoft.ACE.OLEDB.12.0;" + "Data Source=" + txtPath.Text + ";" + "Extended Properties=" + "\"" + "Excel 12.0;HDR=YES;" + "\"";

        OleDbConnection oleCon = new OleDbConnection(conString);

        OleDbCommand oleCmd = new OleDbCommand("SELECT field1, field2, field3 FROM [Sheet1$]", oleCon);

        DataTable dt = new DataTable();

        oleCon.Open(); 
        dt.Load(oleCmd.ExecuteReader());
        oleCon.Close();

If you can put your code here, so that I can try to rectify. :)

Upvotes: 3

djcrabhat
djcrabhat

Reputation: 474

You just want to read a list of numbers from a file? Does it have to be in Excel? Is some non technical person updating the list? If you want to read 50,000 numbers from a single column in to a list in memory, just copy the cells to a text file and read with a TextReader. It'll be instant.

List<string> ReadFile(string path)
{
   TextReader tr = new StreamReader(path);
   string line;
   List<string> lines = new List<string>();
   while((line=tr.ReadLine())!=null)
   {
       //if this was a CSV, you could string.split(',') here
       lines.add(line);
   }

   return lines;
}

Upvotes: 0

Freelancer
Freelancer

Reputation: 9074

OLEDB will always take more time.

SQL Server 2005/2008 will make it faster.

For OLEDB connections, it takes 7 records per seconds while

For SQLServer , it takes 70 records per seconds.

There requires not much time in reading comma separated files, but time is required to insert the data.

I have literally experienced this thing.

Upvotes: 2

Related Questions