Firdavs Kurbonov
Firdavs Kurbonov

Reputation: 1252

How to optimize the loop through each row in Excel using C#?

I have around 6000 rows and 600 columns in my Excel worksheet. There is an ID column where I am highlighting differentiated values in all cells if the IDs are equal. But my code is taking too long to loop through each row and highlight. How my code could be optimized?

 private void worker_DoWork(object sender, System.ComponentModel.DoWorkEventArgs e)
{
    for (r = 2; r <= totalRow; r++)
    {
        Duplicate();
        int percentage = (r + 1) * 100 / totalRow;
        worker.ReportProgress(percentage);
    }             
}

private void Duplicate()
{
    if (xlWorksheet.Cells[r, ssid].Value == xlWorksheet.Cells[r + 1, ssid].Value)
    {
        for (int c = 1; c <= columnCount; c++)
        {
            if (Convert.ToString(xlWorksheet.Cells[r, c].Value) != Convert.ToString(xlWorksheet.Cells[r + 1, c].Value))
            {
                Excel.Range cellRange = (Excel.Range)xlWorksheet.Cells[r + 1, c];
                cellRange.Interior.Color = Excel.XlRgbColor.rgbRed;
            }
        }
    }
}

Upvotes: 2

Views: 3224

Answers (2)

Vityata
Vityata

Reputation: 43585

Reading and writing to excel is what slows the program quite a bit. Try to avoid the reading and the unnecessary writing:

  • Read the Excel file once and write its values in a 2-dimensional matrix, with dimensions row and column;
  • Loop through the matrix and check the values. For 3.6M it will be fast;
  • Record the results in another 2-dimensional boolean matrix.
  • Loop through the booelan matrix and write only on the cells which you need to write;
  • You can even do something better - loop through the boolean matrix and assign the corresponding cells to a range with Union(). At the end change the color of the range only once.

Following the above points (without the last one), this is a basic start:

enter image description here

And this is a basic end, considering that our task is to color equal cells for every row (this task can be easily changed):

enter image description here

This is the code:

using System;
using Excel = Microsoft.Office.Interop.Excel;

class StartUp
{
    static void Main()
    {
        string filePath = @"C:\Sample.xlsx";

        int rowsCount = 5;
        int colsCount = 6;

        Excel.Application excel = new Excel.Application();
        excel.Visible = false; 
        excel.EnableAnimations = false;

        Excel.Workbook wkb = Open(excel, filePath);
        Excel.Worksheet wk = (Excel.Worksheet)excel.Worksheets.get_Item(1);

        Excel.Range startCell = wk.Cells[1, 1];
        Excel.Range endCell = wk.Cells[rowsCount, colsCount];
        Excel.Range currentRange = wk.get_Range(startCell, endCell).Cells;
        currentRange.Interior.Color = Excel.XlRgbColor.rgbWhite;

        object[,]  matrixRead = (object[,])currentRange.Value;
        bool[,] matrixResult = new bool[rowsCount+1,colsCount+1];

        for (int rows = 1; rows <= rowsCount; rows++)
        {
            for (int cols = 1; cols < colsCount; cols++)
            {
                if (matrixRead[rows,cols].ToString()==matrixRead[rows,cols+1].ToString())
                {
                    matrixResult[rows, cols] = true;
                    matrixResult[rows, cols + 1] = true;
                }
            }
        }

        for (int rows = 1; rows <= rowsCount; rows++)
        {
            for (int cols = 1; cols <= colsCount; cols++)
            {
                if (matrixResult[rows, cols])
                {
                    currentRange.Cells[rows, cols].interior.color = 
                                                Excel.XlRgbColor.rgbRed;
                }                
            }
        }

        excel.EnableAnimations = true;
        wkb.Close(true);
        excel.Quit();
        Console.WriteLine("Finished!");
    }

    private static Excel.Workbook Open(Excel.Application excelInstance, 
            string fileName, bool readOnly = false, 
            bool editable = true, bool updateLinks = true)
    {
        Excel.Workbook book = excelInstance.Workbooks.Open(
            fileName, updateLinks, readOnly,
            Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing,
            Type.Missing, editable, Type.Missing, Type.Missing, Type.Missing,
            Type.Missing, Type.Missing);
        return book;
    }
}

About 95% of the speed (or much more by 3.6M cells) probably comes from these two lines:

excel.Visible = false; 
excel.EnableAnimations = false;

In general the two pairs of nested loops can be avoided and made into one pair quite easily - like this:

for (int rows = 1; rows <= rowsCount; rows++)
{
    for (int cols = 1; cols < colsCount; cols++)
    {
        if (matrixRead[rows,cols].ToString()==matrixRead[rows,cols+1].ToString())
        {
            currentRange.Cells[rows, cols].interior.color = Excel.XlRgbColor.rgbRed;
            currentRange.Cells[rows, cols+1].interior.color = Excel.XlRgbColor.rgbRed;
        }
    }
}

But the idea was to present the matrixResult as one range through Union() and update the background of the whole range at once. Thus, the second pair of nested loops should not exist, but instead it should be something like: currentRange.Interior.Color = Excel.XlRgbColor.rgbRed

Upvotes: 3

dotnetbasic
dotnetbasic

Reputation: 11

try to read the whole excel file in dataset. then you can use below code to

Basic Example :

DataSet ResultDataSet= new DataSet();
DataTable customers = myDataset.Tables.Add("Customers");
customers.Columns.Add("Name");
customers.Columns.Add("Age");
customers.Rows.Add("Chris", "25");
int records=10;   // split the  10 records per table. if 50 records will be there then 5 tables will generate.

var splittedTables = ResultDataSet.AsEnumerable()
                                .Select((row, index) => new { row, index })
                                .GroupBy(x => x.index / records)
                                .Select(g => g.Select(x => x.row).CopyToDataTable())
                                .ToArray();

try this can work fast compare to foreach statement.

Upvotes: -1

Related Questions