Reputation: 861
I'm wanting to read an Excel 2007+ document using the OpenXML SDK v2.0 in an SAX-like manner. I am using this blog post as a rough guide: http://blogs.msdn.com/b/brian_jones/archive/2010/05/27/parsing-and-reading-large-excel-files-with-the-open-xml-sdk.aspx
However, in my document I have a mix of strings and numeric values. Therefore, the string values are being stored as a SharedString, so when the CellValue is read for such a cell, I get a numeric number, which I have read is the index (and thus need to grab the InnerText). This seems to be adding too much complications. Is there anyway for me to simply treat all cells in the worksheet as text / strings, and iterate through all the cells grabbing the values, in a manner similar to the blog posts example?
Thanks
Upvotes: 3
Views: 6833
Reputation: 3166
Would the following be helpful?
List<string> listShared = new List<string>();
using (SpreadsheetDocument xl = SpreadsheetDocument.Open("YourFile.xlsx", false))
{
SharedStringItem ssi;
using (OpenXmlReader oxrShared = OpenXmlReader.Create(xl.WorkbookPart.SharedStringTablePart))
{
while (oxrShared.Read())
{
if (oxrShared.ElementType == typeof(SharedStringItem))
{
ssi = (SharedStringItem)oxrShared.LoadCurrentElement();
// this assumes the shared string is a simple text format, instead of rich text.
listShared.Add(ssi.Text.Text);
}
}
}
WorksheetPart wsp = xl.WorkbookPart.WorksheetParts.First();
Cell c;
using (OpenXmlReader oxrCells = OpenXmlReader.Create(wsp))
{
while (oxrCells.Read())
{
if (oxrCells.ElementType == typeof(Cell))
{
c = (Cell)oxrCells.LoadCurrentElement();
// c.CellReference holds a string such as "A1"
if (c.DataType != null)
{
if (c.DataType == CellValues.SharedString)
{
// use whichever from-string-to-number conversion
// you like.
//listShared[Convert.ToInt32(c.CellValue.Text)];
}
else if (c.DataType == CellValues.Number)
{
// "normal" value
//c.CellValue.Text;
}
// there's also boolean, which you might be interested
// as well as other types
}
else
{
// is by default a Number. Use this:
//c.CellValue.Text;
}
}
}
}
}
Note: there are no error bound checks, or nullity checks. It's meant to illustrate how to get at the shared strings in the simplest minimal manner possible.
Also, the list of shared strings is assumed to be "simple" shared strings, meaning no rich text.
The logic is that you load up the list of shared strings in the worksheet into a List that you can easily manipulate. Then as you iterate through the cells, you can just check on the List again if you see a cell of data type SharedString. If the cell is of data type Number, then just proceed as usual.
Upvotes: 11