Reputation: 1499
I am running some openXML based code in c sharp that's stuck on the first row of data and just loops it again and again. I'm clear that I need to incorporate the row variable into the mix but have tried various methods to no avail. Anyone have any ideas on this?
In the code block below, sst.ChildElements[7].InnerText gets back the content of the first row column 7 but each time the row loops its the content from the same CELL! I want to move onto the next row :-(
string fileName = @"c:\temp\accountData.xlsx";
using (FileStream fs = new FileStream(fileName, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
{
using (SpreadsheetDocument doc = SpreadsheetDocument.Open(fs, false))
{
WorkbookPart workbookPart = doc.WorkbookPart;
SharedStringTablePart sstpart = workbookPart.GetPartsOfType<SharedStringTablePart>().First();
SharedStringTable sst = sstpart.SharedStringTable;
WorksheetPart worksheetPart = workbookPart.WorksheetParts.First();
Worksheet sheet = worksheetPart.Worksheet;
var cells = sheet.Descendants<Cell>();
var rows = sheet.Descendants<Row>();
Console.WriteLine("Row count = {0}", rows.LongCount());
Console.WriteLine("Cell count = {0}", cells.LongCount());
CompanyProvider cp = _db.GetCompanyProvider();
int i = 0;
// Or... via each row
foreach (Row row in rows.ToList())
{
if (i == 0)
i = i + 1;
else
{
CustomerAddress customerAddress = new CustomerAddress();
customerAddress.AddressLine1 = sst.ChildElements[7].InnerText; // Code hidden for brevity
i = i + 1;
}
}
}
}
Upvotes: 1
Views: 1702
Reputation: 12815
In the code block below, sst.ChildElements[7].InnerText gets back the content of the first row column 7 but each time the row loops its the content from the same CELL!
This is because you are always reading the value from element 7
in the SharedStringsTable
(your sst
variable) and not the value from element 7 in the Row
(your row
variable).
The Shared Strings Table is a mechanism used within OpenXML to prevent duplicate data appearing in cells (to reduce the file size). Instead of a cell containing a string value directly it can instead contain an integer which is the index into the shared strings table. That way if a string is repeated many times within an Excel file it is only stored once but with many references to that string.
You can tell whether or not a Cell
object contains a shared string index by looking at its DataType
property (note that strings can be stored in-line and other datatypes such as numbers are always stored in-line).
If a cell does hold a shared string index then you can use that value to index your sst
property to get the correct content:
sst.ChildElements[<cell content here>].InnerText
To get the rows cells by index you can grab the child Cell
s of the Row
and then use the Enumerable<T>.ElementAt
method to get the Cell
at the index you require:
row.Elements<Cell>().ElementAt(7); //gives the 8th Cell in row - read the "HOWEVER" section!!
Using the above, your foreach
would then become something like:
foreach (Row row in rows.ToList())
{
if (i == 0)
i = i + 1;
else
{
//get the cell at index 7
Cell cell = row.Elements<Cell>().ElementAt(7); //read the warning below
//check the type
if (cell.DataType != null && cell.DataType == CellValues.SharedString)
{
//it's a shared string so use the cell inner text as the index into the
//shared strings table
Console.WriteLine(sst.ChildElements[int.Parse(cell.InnerText)].InnerText);
}
else
{
//it's NOT a shared string, output the value directly
Console.WriteLine(cell.InnerText);
}
i = i + 1;
}
}
HOWEVER...
The above code will work BUT indexing the cells in the way you are trying to do is error prone. The OpenXML schema allows for empty cells (and rows) to be omitted from the file. This means if you have an empty cell somewhere you may end up picking up the wrong value.
For example, I created a file in Excel with the following structure:
Running the above code on that file produces the output:
9
8
Note that on the first row we parse we end up getting the value from I2
but on the second row we read we get the value from H3
. The first row we read is out by one (assuming that by the 7th index you mean you want column H
) because there is no column B
in the XML for that row. That is the reason most of the code you'll find that reads Excel files uses a loop to iterate over the cells.
Upvotes: 0
Reputation: 764
The basic algorithm to interate through rows involves two loops. One for rows the other for cells.
Suppose you have a worksheet.
This is how you get a collection of rows.
IEnumerable<Row> rows = worksheet.Descendants<Row>();
In a row you hav a collection of cells.
So you need to have a first loop on rows
foreach (Row row in rows)
{
}
Here you can get a collection of cell for a row
IEnumerable<Cell> cells = row.Descendants<Cell>()
And then you can iterate through cells in the inner loop
foreach (Cell cell in cells)
{
//Here goes the logic of reading cell value
}
Upvotes: 0