DethoRhyne
DethoRhyne

Reputation: 980

How to optimize reading from Excel table in C#?

I'm working on an application that will parse approx 5500x9 cell range, I did manage to get the basics somehow working but I'm very new at this, and the solution is very basic and takes a LOT of time even to get 100 rows, let alone 5.5k, and now I'm stuck here because any tutorial I've checked so far isn't really helpful, or doesn't give any better performance compared to my current code.

        Excel.Application xlApp;
        Excel.Workbook xlWorkBook;
        Excel.Worksheet xlWorkSheet;

        xlApp = new Excel.Application();
        xlWorkBook = xlApp.Workbooks.Open(@"C:\...\report.xlsx", 0, true, 5, "", "", true, Excel.XlPlatform.xlWindows, "\t", false, false, 0, true, 1, 0);
        xlWorkSheet = (Excel.Worksheet)xlWorkBook.Worksheets.get_Item(1);

        Excel.Range range = xlWorkSheet.UsedRange;
        int rows = range.Rows.Count;

        for (int i = 2; i <= 15; i++)
        {
            Devices.Add(new DeviceInfo((string)(range.Cells[i, 1] as Excel.Range).Value2, (string)(range.Cells[i, 2] as Excel.Range).Value2, (string)(range.Cells[i, 3] as Excel.Range).Value2, (string)(range.Cells[i, 4] as Excel.Range).Value2, (string)(range.Cells[i, 5] as Excel.Range).Value2, (string)(range.Cells[i, 6] as Excel.Range).Value2, (string)(range.Cells[i, 7] as Excel.Range).Value2, (string)(range.Cells[i, 8] as Excel.Range).Value2, (string)(range.Cells[i, 9] as Excel.Range).Value2, (string)(range.Cells[i, 10] as Excel.Range).Value2, Convert.ToDateTime((range.Cells[i, 11] as Excel.Range).Value2), Convert.ToDateTime((range.Cells[i, 12] as Excel.Range).Value2)));
        }

        xlWorkBook.Close(false, System.Reflection.Missing.Value, System.Reflection.Missing.Value);
        xlApp.Quit();

The only thing that comes to mind is reading entire row as a single range then process that data inside the constructor. However, I don't believe that will either be practical since it will still take a lot of time to get all the data.

DeviceInfo class is just a list of properties that match columns inside the Excel worksheet.

Upvotes: 1

Views: 273

Answers (1)

Shiva
Shiva

Reputation: 20935

I too had a similar requirement recently and ended up using EPPlus 4.1.0

It's an excellent Library, been around for a while, is well documented, and actively maintained.

enter image description here

You can install the Nuget package using the Package Manager Console.

enter image description here

Their Samples Solution covers all use cases with code, and you could literally copy-paste and just modify few things and you will be good to go.

And yes, it's blazing fast!**

Here's a sample code for your use-case. Taken from the Samples Project on the Codeplex site.

/*******************************************************************************
 * You may amend and distribute as you like, but don't remove this header!
 * 
 * All rights reserved.
 * 
 * EPPlus is an Open Source project provided under the 
 * GNU General Public License (GPL) as published by the 
 * Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
 * 
 * EPPlus provides server-side generation of Excel 2007 spreadsheets.
 * See http://www.codeplex.com/EPPlus for details.
 *
 *
 * 
 * The GNU General Public License can be viewed at http://www.opensource.org/licenses/gpl-license.php
 * If you unfamiliar with this license or have questions about it, here is an http://www.gnu.org/licenses/gpl-faq.html
 * 
 * The code for this project may be used and redistributed by any means PROVIDING it is 
 * not sold for profit without the author's written consent, and providing that this notice 
 * and the author's name and all copyright notices remain intact.
 * 
 * All code and executables are provided "as is" with no warranty either express or implied. 
 * The author accepts no liability for any damage or loss of business that this product may cause.
 *
 *
 * Code change notes:
 * 
 * Author                           Change                      Date
 *******************************************************************************
 * Jan Källman      Added       10-SEP-2009
 *******************************************************************************/
using System;
using System.Collections.Generic;
using System.Text;
using System.IO;
using OfficeOpenXml;

namespace EPPlusSamples
{
    /// <summary>
    /// Simply opens an existing file and reads some values and properties
    /// </summary>
    class Sample2
    {
        public static void RunSample2(string FilePath)
        {
            Console.WriteLine("Reading column 2 of {0}", FilePath);
            Console.WriteLine();

            FileInfo existingFile = new FileInfo(FilePath);
            using (ExcelPackage package = new ExcelPackage(existingFile))
            {
                // get the first worksheet in the workbook
                ExcelWorksheet worksheet = package.Workbook.Worksheets[1];
                int col = 2; //The item description
                // output the data in column 2
                for (int row = 2; row < 5; row++)
                    Console.WriteLine("\tCell({0},{1}).Value={2}", row, col, worksheet.Cells[row, col].Value);

                // output the formula in row 5
                Console.WriteLine("\tCell({0},{1}).Formula={2}", 3, 5, worksheet.Cells[3, 5].Formula);                
                Console.WriteLine("\tCell({0},{1}).FormulaR1C1={2}", 3, 5, worksheet.Cells[3, 5].FormulaR1C1);

                // output the formula in row 5
                Console.WriteLine("\tCell({0},{1}).Formula={2}", 5, 3, worksheet.Cells[5, 3].Formula);
                Console.WriteLine("\tCell({0},{1}).FormulaR1C1={2}", 5, 3, worksheet.Cells[5, 3].FormulaR1C1);

            } // the using statement automatically calls Dispose() which closes the package.

            Console.WriteLine();
            Console.WriteLine("Sample 2 complete");
            Console.WriteLine();
        }
    }
}

( ** = If you find the performance for your use case to be unacceptable, then you can take a look at this fork of EPPlus that fixes some of the performance issues.

https://github.com/RadoslavGatev/EPPlus-Performance

Mind you, these performance issues are for extremely large excel datasets -- we are talking 50,000+ cells in 100s and 1000s of tables - and most use cases wouldn't encounter it. )

Upvotes: 4

Related Questions