Adam Mrozek
Adam Mrozek

Reputation: 1480

HTMLAgilityPack using my own tags

I need to parse few html elements to list using html agility pack and remove them from document. I wrote following code:

HtmlDocument doc = new HtmlDocument();
doc.Load(tempFileHTML);
doc.OptionSupportOptionalEndTags = true;
doc.OptionWriteEmptyNodes = true;

List<HtmlNode> tagResolver = doc.DocumentNode.Descendants("link").ToList();
for (int i = 0; i < tagResolver.Count; i++)
{
    elements.Add(tagResolver[i].OuterHtml);
    tagResolver[i].Remove();
}

doc.Save(tempFileHTML, Encoding.GetEncoding(HTMLtoPDF.DefaultEncoding));

The problem is that my start html file looks like this:

<table>
    <LOOP>
        <tr>
            <td>{CODE}</td>
        </tr>
    </LOOP>
</table>

and after doc.Save() this file looks like this:

<table>
    <loop>
    </loop>
        <tr>
            <td>{CODE}</td>
        </tr>
</table>

Is there any way to save this document correctly?

Upvotes: 1

Views: 179

Answers (1)

jessehouwing
jessehouwing

Reputation: 114491

There is some specific logic in the agility pack to enforce a correct structure. This code specifically targets li, ul, table, tr etc. so you might be hitting this. See the HtmlDocument.GetResetters method. Turning off OptionFixNestedTags using doc.OptionFixNestedTags = false, should circumvent that behavior.

You should register your tag(s) using HtmlNode.ElementsFlags.Add from the top of my head the right syntax is:

HtmlNode.ElementsFlags.Add("LOOP",  HtmlElementFlag.Empty | HtmlElementFlag.Closed);

That way you can define how you expect the HtmlAgilityPack to parse your markers.

Also: There is a MixedCodeDocument class which you can use as well, which requires you to specify a token for your own tags, that way you could use <%loop%> and it might provide an escape for you. You can specify the TokenStart and TokenEnd on the document before parsing.

Upvotes: 2

Related Questions