Reputation: 150
I have been struggling to find a solution on how to read a large xlsx file with OpenXml. I have tried the microsoft samples without luck. I simply need to read an excel file into a DataTable in c#. I am not concerned with value types in the datatable, everything can be stored as a string values.
The samples I have found so far don't retain the structure of the spreadsheet and only return the values of the cells.
Any ideas?
Upvotes: 2
Views: 4748
Reputation: 28290
You can get DataTable this way:
using (SpreadsheetDocument spreadsheet = SpreadsheetDocument.Open(fileName, false))
{
DataTable data = ToDataTable(spreadsheet, "Employees");
}
This method will read Excel sheet data as DataTable
public DataTable ToDataTable(SpreadsheetDocument spreadsheet, string worksheetName)
{
var workbookPart = spreadsheet.WorkbookPart;
var sheet = workbookPart
.Workbook
.Descendants<Sheet>()
.FirstOrDefault(s => s.Name == worksheetName);
var worksheetPart = sheet == null
? null
: workbookPart.GetPartById(sheet.Id) as WorksheetPart;
var dataTable = new DataTable();
if (worksheetPart != null)
{
var sheetData = worksheetPart.Worksheet.GetFirstChild<SheetData>();
foreach (Row row in sheetData.Descendants<Row>())
{
var values = row
.Descendants<Cell>()
.Select(cell =>
{
var value = cell.CellValue.InnerXml;
if (cell.DataType != null && cell.DataType.Value == CellValues.SharedString)
{
value = workbookPart
.SharedStringTablePart
.SharedStringTable
.ChildElements[int.Parse(value)]
.InnerText;
}
return (object)value;
})
.ToArray();
dataTable.Rows.Add(values);
}
}
return dataTable;
}
Upvotes: 0
Reputation: 3479
The open xml SDK can be a little hard to understand. However, I have found it useful to use http://simpleooxml.codeplex.com/ this code plex project. It adds a thin layer over the sdk to more easily parse through excel files and work with styles.
Then you can use something like the following with their worksheet reader to recurse through and grab the values you want
System.IO.MemoryStream ms = Utility.StreamToMemory(xslxTemplate);
using (SpreadsheetDocument document = SpreadsheetDocument.Open(ms, true))
{
IEnumerable<Sheet> sheets = document.WorkbookPart.Workbook.GetFirstChild<Sheets>().Elements<Sheet>();
if (sheets.Count() == 0)
{
// The specified worksheet does not exist.
return null;
}
string relationshipId = sheets.First().Id.Value;
WorksheetPart worksheetPart = (WorksheetPart)document.WorkbookPart.GetPartById(relationshipId);
string myval =WorksheetReader.GetCell("A", 0, worksheetPart).CellValue.InnerText;
// Put in a loop to go through contents of document
}
Upvotes: 1