Reputation: 1480
I need to parse few html elements to list using html agility pack and remove them from document. I wrote following code:
HtmlDocument doc = new HtmlDocument();
doc.Load(tempFileHTML);
doc.OptionSupportOptionalEndTags = true;
doc.OptionWriteEmptyNodes = true;
List<HtmlNode> tagResolver = doc.DocumentNode.Descendants("link").ToList();
for (int i = 0; i < tagResolver.Count; i++)
{
elements.Add(tagResolver[i].OuterHtml);
tagResolver[i].Remove();
}
doc.Save(tempFileHTML, Encoding.GetEncoding(HTMLtoPDF.DefaultEncoding));
The problem is that my start html file looks like this:
<table>
<LOOP>
<tr>
<td>{CODE}</td>
</tr>
</LOOP>
</table>
and after doc.Save() this file looks like this:
<table>
<loop>
</loop>
<tr>
<td>{CODE}</td>
</tr>
</table>
Is there any way to save this document correctly?
Upvotes: 1
Views: 179
Reputation: 114491
There is some specific logic in the agility pack to enforce a correct structure. This code specifically targets li
, ul
, table
, tr
etc. so you might be hitting this. See the HtmlDocument.GetResetters
method. Turning off OptionFixNestedTags
using doc.OptionFixNestedTags = false
, should circumvent that behavior.
You should register your tag(s) using HtmlNode.ElementsFlags.Add
from the top of my head the right syntax is:
HtmlNode.ElementsFlags.Add("LOOP", HtmlElementFlag.Empty | HtmlElementFlag.Closed);
That way you can define how you expect the HtmlAgilityPack to parse your markers.
Also: There is a MixedCodeDocument
class which you can use as well, which requires you to specify a token for your own tags, that way you could use <%loop%>
and it might provide an escape for you. You can specify the TokenStart
and TokenEnd
on the document before parsing.
Upvotes: 2