Reputation: 1935
I am trying to read an Excel file into a list of Data.DataTable, although with my current method it can take a very long time. I essentually go Worksheet by Worksheet, cell by cell, and it tends to take a very long time. Is there a quicker way of doing this? Here is my code:
List<DataTable> List = new List<DataTable>();
// Counting sheets
for (int count = 1; count < WB.Worksheets.Count; ++count)
{
// Create a new DataTable for every Worksheet
DATA.DataTable DT = new DataTable();
WS = (EXCEL.Worksheet)WB.Worksheets.get_Item(count);
textBox1.Text = count.ToString();
// Get range of the worksheet
Range = WS.UsedRange;
// Create new Column in DataTable
for (cCnt = 1; cCnt <= Range.Columns.Count; cCnt++)
{
textBox3.Text = cCnt.ToString();
Column = new DataColumn();
Column.DataType = System.Type.GetType("System.String");
Column.ColumnName = cCnt.ToString();
DT.Columns.Add(Column);
// Create row for Data Table
for (rCnt = 0; rCnt <= Range.Rows.Count; rCnt++)
{
textBox2.Text = rCnt.ToString();
try
{
cellVal = (string)(Range.Cells[rCnt, cCnt] as EXCEL.Range).Value2;
}
catch (Microsoft.CSharp.RuntimeBinder.RuntimeBinderException)
{
ConvertVal = (double)(Range.Cells[rCnt, cCnt] as EXCEL.Range).Value2;
cellVal = ConvertVal.ToString();
}
// Add to the DataTable
if (cCnt == 1)
{
Row = DT.NewRow();
Row[cCnt.ToString()] = cellVal;
DT.Rows.Add(Row);
}
else
{
Row = DT.Rows[rCnt];
Row[cCnt.ToString()] = cellVal;
}
}
}
// Add DT to the list. Then go to the next sheet in the Excel Workbook
List.Add(DT);
}
Upvotes: 34
Views: 175948
Reputation: 3811
I know it's an old topic, but there's an easy and fast way to do it by MiniExcel library like following code
var filePath = @".....demo.xlsx";
List<DataTable> list = new List<DataTable>();
var sheets = MiniExcel.GetSheetNames(filePath);
foreach (var sheetName in MiniExcel.GetSheetNames(filePath))
list.Add(MiniExcel.QueryAsDataTable(filePath,true,sheetName:sheetName));
Best regards
Upvotes: 0
Reputation: 375
class DataReader
{
Excel.Application xlApp;
Excel.Workbook xlBook;
Excel.Range xlRange;
Excel.Worksheet xlSheet;
public DataTable GetSheetDataAsDataTable(String filePath, String sheetName)
{
DataTable dt = new DataTable();
try
{
xlApp = new Excel.Application();
xlBook = xlApp.Workbooks.Open(filePath);
xlSheet = xlBook.Worksheets[sheetName];
xlRange = xlSheet.UsedRange;
DataRow row=null;
for (int i = 1; i <= xlRange.Rows.Count; i++)
{
if (i != 1)
row = dt.NewRow();
for (int j = 1; j <= xlRange.Columns.Count; j++)
{
if (i == 1)
dt.Columns.Add(xlRange.Cells[1, j].value);
else
row[j-1] = xlRange.Cells[i, j].value;
}
if(row !=null)
dt.Rows.Add(row);
}
}
catch (Exception ex)
{
Console.WriteLine(ex.Message);
}
finally
{
xlBook.Close();
xlApp.Quit();
}
return dt;
}
}
Upvotes: 1
Reputation: 152521
Caling .Value2
is an expensive operation because it's a COM-interop call. I would instead read the entire range into an array and then loop through the array:
object[,] data = Range.Value2;
// Create new Column in DataTable
for (int cCnt = 1; cCnt <= Range.Columns.Count; cCnt++)
{
textBox3.Text = cCnt.ToString();
var Column = new DataColumn();
Column.DataType = System.Type.GetType("System.String");
Column.ColumnName = cCnt.ToString();
DT.Columns.Add(Column);
// Create row for Data Table
for (int rCnt = 1; rCnt <= Range.Rows.Count; rCnt++)
{
textBox2.Text = rCnt.ToString();
string CellVal = String.Empty;
try
{
cellVal = (string)(data[rCnt, cCnt]);
}
catch (Microsoft.CSharp.RuntimeBinder.RuntimeBinderException)
{
ConvertVal = (double)(data[rCnt, cCnt]);
cellVal = ConvertVal.ToString();
}
DataRow Row;
// Add to the DataTable
if (cCnt == 1)
{
Row = DT.NewRow();
Row[cCnt.ToString()] = cellVal;
DT.Rows.Add(Row);
}
else
{
Row = DT.Rows[rCnt + 1];
Row[cCnt.ToString()] = cellVal;
}
}
}
Upvotes: 27
Reputation: 89
Dim sSheetName As String
Dim sConnection As String
Dim dtTablesList As DataTable
Dim oleExcelCommand As OleDbCommand
Dim oleExcelReader As OleDbDataReader
Dim oleExcelConnection As OleDbConnection
sConnection = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=C:\Test.xls;Extended Properties=""Excel 12.0;HDR=No;IMEX=1"""
oleExcelConnection = New OleDbConnection(sConnection)
oleExcelConnection.Open()
dtTablesList = oleExcelConnection.GetSchema("Tables")
If dtTablesList.Rows.Count > 0 Then
sSheetName = dtTablesList.Rows(0)("TABLE_NAME").ToString
End If
dtTablesList.Clear()
dtTablesList.Dispose()
If sSheetName <> "" Then
oleExcelCommand = oleExcelConnection.CreateCommand()
oleExcelCommand.CommandText = "Select * From [" & sSheetName & "]"
oleExcelCommand.CommandType = CommandType.Text
oleExcelReader = oleExcelCommand.ExecuteReader
nOutputRow = 0
While oleExcelReader.Read
End While
oleExcelReader.Close()
End If
oleExcelConnection.Close()
Upvotes: 2
Reputation: 1528
In case anyone else is using EPPlus. This implementation is pretty naive, but there are comments that draw attention to such. If you were to layer one more method GetWorkbookAsDataSet()
on top it would do what the OP is asking for.
/// <summary>
/// Assumption: Worksheet is in table format with no weird padding or blank column headers.
///
/// Assertion: Duplicate column names will be aliased by appending a sequence number (eg. Column, Column1, Column2)
/// </summary>
/// <param name="worksheet"></param>
/// <returns></returns>
public static DataTable GetWorksheetAsDataTable(ExcelWorksheet worksheet)
{
var dt = new DataTable(worksheet.Name);
dt.Columns.AddRange(GetDataColumns(worksheet).ToArray());
var headerOffset = 1; //have to skip header row
var width = dt.Columns.Count;
var depth = GetTableDepth(worksheet, headerOffset);
for (var i = 1; i <= depth; i++)
{
var row = dt.NewRow();
for (var j = 1; j <= width; j++)
{
var currentValue = worksheet.Cells[i + headerOffset, j].Value;
//have to decrement b/c excel is 1 based and datatable is 0 based.
row[j - 1] = currentValue == null ? null : currentValue.ToString();
}
dt.Rows.Add(row);
}
return dt;
}
/// <summary>
/// Assumption: There are no null or empty cells in the first column
/// </summary>
/// <param name="worksheet"></param>
/// <returns></returns>
private static int GetTableDepth(ExcelWorksheet worksheet, int headerOffset)
{
var i = 1;
var j = 1;
var cellValue = worksheet.Cells[i + headerOffset, j].Value;
while (cellValue != null)
{
i++;
cellValue = worksheet.Cells[i + headerOffset, j].Value;
}
return i - 1; //subtract one because we're going from rownumber (1 based) to depth (0 based)
}
private static IEnumerable<DataColumn> GetDataColumns(ExcelWorksheet worksheet)
{
return GatherColumnNames(worksheet).Select(x => new DataColumn(x));
}
private static IEnumerable<string> GatherColumnNames(ExcelWorksheet worksheet)
{
var columns = new List<string>();
var i = 1;
var j = 1;
var columnName = worksheet.Cells[i, j].Value;
while (columnName != null)
{
columns.Add(GetUniqueColumnName(columns, columnName.ToString()));
j++;
columnName = worksheet.Cells[i, j].Value;
}
return columns;
}
private static string GetUniqueColumnName(IEnumerable<string> columnNames, string columnName)
{
var colName = columnName;
var i = 1;
while (columnNames.Contains(colName))
{
colName = columnName + i.ToString();
i++;
}
return colName;
}
Upvotes: 8
Reputation: 2125
MS Office Interop is slow and even Microsoft does not recommend Interop usage on server side and cannot be use to import large Excel files. For more details see why not to use OLE Automation from Microsoft point of view.
Instead, you can use any Excel library, like EasyXLS for example. This is a code sample that shows how to read the Excel file:
ExcelDocument workbook = new ExcelDocument();
DataSet ds = workbook.easy_ReadXLSActiveSheet_AsDataSet("excel.xls");
DataTable dataTable = ds.Tables[0];
If your Excel file has multiple sheets or for importing only ranges of cells (for better performances) take a look to more code samples on how to import Excel to DataTable in C# using EasyXLS.
Upvotes: 3
Reputation: 3563
Please check out the below links
http://www.codeproject.com/Questions/376355/import-MS-Excel-to-datatable (6 solutions posted)
Best /Fastest way to read an Excel Sheet into a DataTable?
Upvotes: 4