Reputation: 79
I posted a question yesterday with great success, it did not exactly give me what I needed but certainly more than enough to put me on the right path. I ran into another difficulty and was hoping to find similar guidance.
I have a document with several different types of elements, some which can be nested within others. I need to remove all tags and leave only the inner HTML whenever a certain element is present.
For example, if the element pnum is present, I need to take the whole element and remove any inner elements, leaving behind only the inner html.
input:
<li>
<pnum>
blah blah
<linum>hello hello</linum>
good bye
<title>good morning</title>
</pnum>
</li>
output:
<li>
blah blah
hello hello
good bye
good morning
<li>
I was able to do this using HTMLAGILITYPACK, but I had to traverse every node and the performance is not great. I am wondering if there is a quicker XSLT transform I can perform on the doc.
Thanks in advance!
Upvotes: 3
Views: 441
Reputation: 167716
I am not sure where you have taken the term innerHTML
from but since IE 4 it usually includes the markup so your request to strip markup does not seem to be related to innerHTML.
As for XSLT, you can use
<xsl:template match="li[.//pnum]">
<xsl:copy>
<xsl:value-of select="."/>
</xsl:copy>
</xsl:template>
to have any li
element with a pnum
descendant transformed to an li
with only the text contents.
Upvotes: 1