Reputation: 83
I have word document, where every paragraph is a very long line. Something like:
"NameOfSomeSort ----ASDdASFA---F-TEXT-FASFASFAS----FASFASF"
characters
"TEXT"
are being highlighted. I need to be able to tell, which characters in line are highlited and get their position index in the line.
I was able to do it via Interoop, but the operation will take cca 5-10 hours to go through whole document. So I tried OpenXML, but I'm not able to get text properties like Highlight when I cycle through paragraphs texts.
Upvotes: 0
Views: 1284
Reputation: 1074
Highlight is applied to the run (in runProperties) (https://msdn.microsoft.com/en-us/library/documentformat.openxml.wordprocessing.highlight(v=office.14).aspx)
if your text is "aaaaa [i am highlight] bbbb" the openxml will look like
<w:Paragraph>
<w:Run><w:Text>aaaaa</w:Text></w:Run>
<w:Run>
<w:rPr>
<w:highlight w:val="yellow" />
</w:rPr>
<w:Text>[i am highlight]</w:Text>
</w:Run>
<w:Run><w:Text>bbbb</w:Text></w:Run>
</w:Paragraph>
So, to find wich text is highlight you have to search for the highlight tag
with something like Paragraph.Descendants<Highlight>()
If you need to retrieve the position you can use some algorithm like
// Suppose you have the paragraph p you want to inspec and the run r containing highlight
int pos = 0;
OpenXmlElement oxe = null;
// From the run search for the parent (Paragraph p)
// Add the length of previous text in pos
while ((oxe = r.previousSibling()) != p)
{
pos += ((Run)oxe).Innertext.Length;
}
// here pos should return where the highlight begin (maybe it's pos+1...)
Upvotes: 1