juliushuck
juliushuck

Reputation: 1684

HTML Agility Pack not changing text of text node

I want to replace ## with ++ in an HTML document (but just in text nodes).

I'm using HTML Agility Pack to manipulate the document. This is my code:

private static void Main(string[] args)
{
    var htmlDoc = new HtmlDocument();
    htmlDoc.LoadHtml("<html><p>This is a test paragraph ##</p><a>Not here ##</a><div><p>Nested paragraph ##</p></div></html>");
    Console.WriteLine(htmlDoc.Text);
    GenerateLinksInHtmlNode(htmlDoc.DocumentNode.ChildNodes);
    Console.WriteLine(htmlDoc.Text);
    Console.ReadKey();
}

private static void GenerateLinksInHtmlNode(HtmlNodeCollection htmlNodeColl)
{
    foreach (var childNode in htmlNodeColl)
    {
        switch (childNode.NodeType)
        {
            case HtmlNodeType.Document:
            case HtmlNodeType.Element:
                GenerateLinksInHtmlNode(childNode.ChildNodes);
                break;
            case HtmlNodeType.Text when childNode.ParentNode.Name == "a":
                continue;
            case HtmlNodeType.Text:
            {
                var txtNode = (HtmlTextNode) childNode;
                txtNode.Text = GenerateLinks(txtNode.Text);
                break;
            }
        }
    }
}

private static string GenerateLinks(string txt)
{
    return Regex.Replace(txt, "##", "++");
}

When I debug it, I can see that the text node has a replaced text, when it should be replaced. But in the second Console.WriteLine(), the text is the same as in the first log.

Upvotes: 2

Views: 652

Answers (1)

Alexander Petrov
Alexander Petrov

Reputation: 14231

The Text property is set when the document is loaded. After that, it does not change. See source.

Use InnerHtml or OuterHtml property to see the changes.

Console.WriteLine(htmlDoc.DocumentNode.InnerHtml);

Upvotes: 2

Related Questions