Avrigeanu Laurian
Avrigeanu Laurian

Reputation: 85

Excel to DataTable Interop

Before you ask, no, I cannot use ExcelDataReader, EPPlus or anything that will read/write directly from memory, the reason is an encryption software haunting my every moment of my life.

What I am doing right now is using the Excel Interop to open excel, copy every cell in the workbook to the clipboard formatted as Text, then creating a datatable from it like you see here:

    public static DataTable WorkbookToDataTable(string filePath, int sheet = 1, int skipfirstRows = 0)
    {
        var app = new Application();
        app.Visible = true;
        var workbook = app.Workbooks.Open(filePath);
        var worksheet = workbook.Worksheets[sheet] as Worksheet;
        Range range = worksheet?.UsedRange;
        range?.Select();
        range?.Copy();

        //get clipboard data
        var clipboardData = Clipboard.GetText(TextDataFormat.Text);
        var rows = clipboardData.Split('\r');

        var dataTable = new DataTable();
        foreach (var item in rows.Skip(skipfirstRows))
        {
            dataTable.Columns.Add(item);
        }
        foreach (var item in rows.Skip(skipfirstRows+2))
        {
            var dtRow = dataTable.NewRow();
            dtRow.ItemArray = item.Split("\t").ToArray<object>();
            dataTable.Rows.Add(dtRow);
        }

        workbook.Close();
        app.Quit();
        return dataTable;
        
    }

The problem is it's too slow, isn't there any way to optimize this to be a bit quicker? After all I already have the data loaded in after 1 second of the workbook opening, but it takes so much more to transfer the data into a datatable in the foreach that I create the datatable rows..

I didn't find any other questions answered regarding this data conversion on stackoverflow that did not already recommend other libraries.

Upvotes: 0

Views: 2184

Answers (1)

mattjr747
mattjr747

Reputation: 96

I would avoid using "select" statements at almost any cost. They are almost never necessary. You can copy that UsedRange without selecting it first.

But is seems to me that the bigger problem here is using the clipboard at all. You can load the data directly into an array or even directly into your datatable.

Try something more like this:

public static DataTable WorkbookToDataTable(string filePath, int sheet = 1, int skipfirstRows = 0)
{
   var app = new Application();
   app.Visible = true;
   Workbook workbook = app.Workbooks.Open(filePath);
   Worksheet worksheet = workbook.Worksheets[sheet];

   //Gets the entire used range as a 2D array
   object[,] rangeAsArray = (object[,])worksheet?.UsedRange.Value2;

   List<string> newRow = new List<string>();
   var dataTable = new DataTable();
       
   for (int rowNum = 1; rowNum <= rowArray.GetUpperBound(0); rowNum++)
   {
      for (int columnNum = 1; columnNum<= rowArray.GetUpperBound(1); columnNum++)
      {
         // In my solution, the first row of the table is assumed to be header rows.
         // So the first row's items will be the name of each column
         if (rowNum == 1)
         {
            dataTable.Columns.Add(new System.Data.DataColumn(rowArray[rowNum, columnNum].ToString(), typeof(object)));
         }
         else if (rowArray[rowNum, columnNum] != null)
         {
            newRow.Add(rowArray[rowNum, columnNum].ToString());
         }
      }
   }

   if(rowNum != 1)
   {
      dataTable.Rows.Add(newRow);
      newRow = new List<string>();
   }

   workbook.Close();
   app.Quit();
   return dataTable;        
}

I am sure there is an even better way to do this (I tried to avoid the nested for loops and couldn't find a way), but I timed these solutions: On my machine, my solution was consistently 1/4th the time, often 1/5th the time.

Also you would have to adapt your "skipfirstRows" functionality. I didn't quite understand your intention there.

Hope this helps!

Upvotes: 1

Related Questions