Reputation: 10872
I count rows in two worksheets like this:
foreach (WorksheetPart worksheetPart in workbookPart.WorksheetParts)
{
OpenXmlPartReader reader = new OpenXmlPartReader(worksheetPart);
if (count == 0)
{
while (reader.Read())
{
if (reader.ElementType == typeof(Row))
{
count_first++;
}
}
}
else if (count == 1)
{
while (reader.Read())
{
if (reader.ElementType == typeof(Row))
{
count_second++;
}
}
}
count++;
}
For both worksheets in count_first
and count_second
I get twice as much as there are rows with data. Why is that and what does it actually mean? Does it mean that OpenXML
parses each list twice?
EDIT
Well, I found a solution. To get it right away, I guess, you should keep this sacred knowledge in some secret place. So, here it is:
while (reader.Read())
{
if (reader.ElementType == typeof(Row))
{
do
{
count_first++;
} while (reader.ReadNextSibling());
}
}
Upvotes: 3
Views: 803
Reputation: 12815
The reason you are getting twice the count is due to the way the OpenXmlReader
reads each element. The reader treats the open and close nodes as independant items which can be differentiated by checking the IsStartElement
and IsEndElement
properties.
To demonstrate this you can run something like this:
using (OpenXmlReader reader = OpenXmlReader.Create(worksheetPart))
{
while (reader.Read())
{
if (reader.ElementType == typeof(Row))
{
do
{
Console.WriteLine("{0} {1} {2}",
reader.ElementType,
reader.IsStartElement,
reader.IsEndElement);
} while (reader.Read());
Console.WriteLine("Finished");
}
}
}
Which will produce output along the lines of the following* for a sheet with two rows and two columns (I've highlighted the Rows for readibility):
Row True False
Cell True False
CellValue True False
CellValue False True
Cell False True
Cell True False
CellValue True False
CellValue False True
Cell False True
Row False True
Row True False
Cell True False
CellValue True False
CellValue False True
Cell False True
Cell True False
CellValue True False
CellValue False True
Cell False True
Row False True
There are 2 ways that you can solve this depending on how you want to read the document. The first way (as you point out in your answer) is to move to the next sibling by calling ReadNextSibling
- this essentially "jumps" the end element (and any children of the Row
). Changing the above example to use ReadNextSibling
in the do
loop:
do
{
Console.WriteLine("{0} {1} {2}",
reader.ElementType,
reader.IsStartElement,
reader.IsEndElement);
} while (reader.ReadNextSibling());
You'll get output* of:
Row True False
Row True False
The second way would be to just count the start elements (or indeed the end elements; just not both):
while (reader.Read())
{
if (reader.ElementType == typeof(Row) && reader.IsStartElement)
{
count_first++;
}
}
Which one you choose depends on whether you wish to read the Cell
values and how you'd like to read them (SAX or DOM).
* In reality each row is prefixed with the namespace of "DocumentFormat.OpenXml.Spreadsheet." which I've removed for readibility.
Upvotes: 5
Reputation: 10872
This works as expected:
while (reader.Read())
{
if (reader.ElementType == typeof(Row))
{
do
{
count_first++;
} while (reader.ReadNextSibling());
}
}
Upvotes: 0