Reynier
Reynier

Reputation: 2478

Read a huge Excel column from C#

I've a Excel (could be 2010 or 2013 don't know if this can be a problem later) document with four columns. The first three columns store phone numbers which basically are a string with 10 or more characters. The four column only and forever will store 1, 2, 3 or 4 and it's a category. I need to check if each number in column A appears in column B and C or not so I think in read all the Excel cells for each column and store in a list (isn't implemented yet because the problem I'll explain below). For that I make this code:

private void btnCargarExcel_Click(object sender, EventArgs e)
        {
            if (this.openFileDialog1.ShowDialog() == DialogResult.OK)
            {

                if (System.IO.File.Exists(openFileDialog1.FileName))
                {
                    filePath.Text = openFileDialog1.FileName.ToString();

                    Excel.Application xlApp;
                    Excel.Workbook xlWorkBook;
                    Excel.Worksheet xlWorkSheet;
                    Excel.Range range;

                    string str;

                    int rCnt = 0;

                    xlApp = new Microsoft.Office.Interop.Excel.Application();
                    xlWorkBook = xlApp.Workbooks.Open(openFileDialog1.FileName, 0, true, 5, "", "", true, Microsoft.Office.Interop.Excel.XlPlatform.xlWindows, "\t", false, false, 0, true, 1, 0);
                    xlWorkSheet = (Excel.Worksheet)xlWorkBook.Worksheets.get_Item(1);

                    range = xlWorkSheet.UsedRange;

                    for (rCnt = 1; rCnt <= range.Rows.Count; rCnt++)
                    {
                        str = (range.Cells[rCnt, 1] as Excel.Range).Value2.ToString();
                        //bd.Add(cleanString(str));

                        bd.Add(cleanString(str, 10));
                    }

                    for (rCnt = 1; rCnt <= range.Rows.Count; rCnt++)
                    {
                        str = (range.Cells[rCnt, 2] as Excel.Range).Value2.ToString();
                        //bd.Add(cleanString(str));

                        bl.Add(cleanString(str, 10));
                    }

                    for (rCnt = 1; rCnt <= range.Rows.Count; rCnt++)
                    {
                        str = (range.Cells[rCnt, 3] as Excel.Range).Value2.ToString();
                        //bd.Add(cleanString(str));

                        cm.Add(cleanString(str, 10));
                    }

                    nrosProcesados.Text = bd.Count().ToString();
                    listBox1.DataSource = bd;

                    noProcesadosBL.Text = bl.Count().ToString();
                    listBox2.DataSource = bl;

                    noProcesadosCM.Text = cm.Count().ToString();
                    listBox3.DataSource = cm;

                    xlWorkBook.Close(true, null, null);
                    xlApp.Quit();

                    releaseObject(xlWorkSheet);
                    releaseObject(xlWorkBook);
                    releaseObject(xlApp);
                }
                else
                {
                    MessageBox.Show("No se pudo abrir el fichero!");
                    System.Runtime.InteropServices.Marshal.ReleaseComObject(appExcel);
                    appExcel = null;
                    System.Windows.Forms.Application.Exit();
                }
            }
        }

So I iterate trough cells in a column and store every number in a list after make some strings changes as you can see in code. The problem here is that column A has 797340 cells, column B has 91617 cells and column C has 95891 cells so if I run the application, load the Excel and wait my PC hang out (even having 12GB of RAM and with Core i3 processor) and I need to open Task Manager and end the task. What is the best solution in order to get what I want (leave only the non repeated numbers) and not to hang out my PC? Will be fine divide things in separately threads (I don't know much about this because I'm starting in C# so any help will be appreciated) for each cycle? What is yours opinion about this topic?

EDIT: Add a new and clean method

So after read and read and get help from some members here I improve the code a bit but now I've another problem (commented below the code). See the code now:

// this goes first when I declare vars
public static System.Array objRowAValues;

// this goes in action when I click the button (I leave only relevant part)
Excel.Application xlApp;
Excel.Workbook xlWorkBook;
Excel.Worksheet xlWorkSheet;
Excel.Range range, rngARowLast;

string str;
int rCnt = 0;

long lastACell, fullRow;

xlApp = new Microsoft.Office.Interop.Excel.Application();
xlWorkBook = xlApp.Workbooks.Open(openFileDialog1.FileName, 0, true, 5, "", "", true, Microsoft.Office.Interop.Excel.XlPlatform.xlWindows, "\t", false, false, 0, true, 1, 0);
xlWorkSheet = (Excel.Worksheet) xlWorkBook.Worksheets.get_Item(1);

range = xlWorkSheet.UsedRange;

fullRow = xlWorkSheet.Rows.Count;
lastACell = xlWorkSheet.Cells[fullRow, 1].End(Excel.XlDirection.xlUp).Row;
rngARowLast = xlWorkSheet.get_Range("A1", "A" + lastACell);
objRowAValues = (System.Array) rngARowLast.Cells.Value;

Now because I'll fill a ListBox with values from objRowAValues and ListBox only accepts List as DataSource then I need to convert objRowAValues to a List of string. I try this but itsn't working for me. Any help?

Upvotes: 0

Views: 2155

Answers (1)

Christian Sauer
Christian Sauer

Reputation: 10899

Unfortunately, I am more of a VB.NET guy - so I have converted some code for you. I hope this works out of the box - I do not use this kind of tool here, so I had no way to test it.

public void test()
{
    object[,] RaWData = null;

    dynamic range = xlWorkSheet.UsedRange;

    //i am unsure here about the correct order - I do not work with excel at Work, so you might have to change the following lange, if columns needs to be before rows or so
    RaWData = range.value2;

    //I am using a list here, because Lists are a lot easier to work with then simple arrays
    List<List<string>> RealData = new List<List<string>>();

    //start at 1  because the excel-delivered array do not have values at index 0 - this is the only 1-based array you will ever encounter in .net
    for (x = 1; x <= Information.UBound(RaWData, 1); x++) {
        List<string> templist = new List<string>();
        for (y = 1; y <= Information.UBound(RaWData, 2); y++) {
            templist.Add(RaWData[x, y].ToString());
        }
        RealData.Add(templist);
    }

    //you should be finished here...
}

Upvotes: 1

Related Questions