CiccioMiami
CiccioMiami

Reputation: 8256

How to get specific elements from an existing Excel file using C# with the Interop.Excel library?

I need to get data from an Excel file to print them in a HTML table (using MVC, no gridview) and eventually store them in a Database.

The mapping between table and Excel is as follows:

Excel -> Table

First Row -> Table headers

Other Cells -> Table data

I am using the Interop.Excel library that provides methods to manipulate Excel files in .NET.

With this code I obtain in lworkSheet variable the Nth worksheet of the Excel file:

var lworkSheet = xlWorkBook.Worksheets.get_Item(N);

Let's assume the Excel file has just one worksheet (N = 1), I can use worksheet specific properties to get Rows, Columns, Cells and Range. These properties return objects of type Interop.Excel.Range.

The problem is that Rows, Columns and Cells return, respectively, all the rows, columns and cells in the Excel file not just those that are filled with data. Therefore in order to get the data I do (the index of the Excel items are 1-based):

var lheaders = xlWorkSheet.Rows.get_Item(1);
var lexcelItems = new Excel.Range[xlWorkSheet.Rows.Count, xlWorkSheet.Columns.Count];

for (var i=0; i < xlWorkSheet.Rows.Count; i++)
{
    for(var j=0; j < xlWorkSheet.Columns.Count; j++)
    {
        lexcelItems[i,j] = xlWorkSheet.Cells.get_Item(i+2, j+1);
    }
}

Besides the computational waste of cycling all rows and columns, these solution is still not acceptable because the get_Item() method returns Range objects!! In order to get the item in the cell I have to use the get_Range(cell_start, cell_end) method and specify the cells in the "A1", "A2", etc... format.

QUESTIONS:

1) Any way to identify last item in row and column?

2) Any way to get the value in the cell without specify the range?

3) Any library that implements the Excel.Range increment? (i.e. (A1++) == A2, etc...).

4) If none of the above is feasible, is there an easy way to read Excel with OLEDB?

Thanks

Francesco

Upvotes: 0

Views: 3711

Answers (2)

mcw
mcw

Reputation: 3596

Haven't worked with this library so I'm just making an educated guess from the MSDN documentation. Are you looking for Worksheet.UsedRange?

http://msdn.microsoft.com/en-us/library/microsoft.office.interop.excel._worksheet.usedrange(v=office.11).aspx

Once you have the Range, it looks like you can use the Range.Cells property (analogous to the Worksheet.Cells property you're using) to get a range of a single cell and then use the Range.Value or Range.Value2 property to get that cell's value:

http://msdn.microsoft.com/en-us/library/microsoft.office.interop.excel.range_properties(v=office.11).aspx

To specify the cells within the UsedRange based on position rather than row/column names, use Range.Offset:

http://msdn.microsoft.com/en-us/library/microsoft.office.interop.excel.range(v=office.11).aspx

Upvotes: 1

Ekkehard.Horner
Ekkehard.Horner

Reputation: 38745

Using OLEDB to access/process Excel data is better than to loop over .Cells(); putting a (selection of) a sheet with one SQL statement into a database table or .GetString a resultset as a HTML table without any Fors is attractive.

The con: if your data contains garbage, the high level ADO/OLEDB facilities won't give you a chance to interfere.

ADDED:

While doing some experiments wrt what "all" means for an Excel sheet, I stumbled upon an interesting fact? about the "$". Perhaps I'm not the only one being not aware of this:

This is the output from my VBScript test script:

-------------------------------------------------------------------------------
SELECT * FROM [SakAct$]
-------------------------------------------------------------------------------
|actor_id|first_name|last_name|last_update         |
|       3|ED        |CHASE    |2/15/2006 4:34:33 AM|
|       4|JENNIFER  |DAVIS    |2/15/2006 4:34:33 AM|
|       1|PENELOPE  |GUINESS  |2/15/2006 4:34:33 AM|
|       2|NICK      |WAHLBERG |2/15/2006 4:34:33 AM|
|  <NULL>|ED        |CHASE    |2/15/2006 4:34:33 AM|
|       5|          |DAVIS    |2/15/2006 4:34:33 AM|
|       6|PENELOPE  |         |2/15/2006 4:34:33 AM|
|       7|NICK      |WAHLBERG |       <NULL>       |
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
SELECT * FROM [SakAct]
-------------------------------------------------------------------------------
|actor_id|first_name|last_name|last_update         |
|       3|ED        |CHASE    |2/15/2006 4:34:33 AM|
|       4|JENNIFER  |DAVIS    |2/15/2006 4:34:33 AM|
|       1|PENELOPE  |GUINESS  |2/15/2006 4:34:33 AM|
|       2|NICK      |WAHLBERG |2/15/2006 4:34:33 AM|
-------------------------------------------------------------------------------

Looks like [Sheet$] turns on an Excel Quirks mode, that will consider more rows than the stricter approach enabled by [Sheet].

Upvotes: 1

Related Questions