MichaelD
MichaelD

Reputation: 8777

encode html in Asp.net C# but leave tags intact

I need to encode a whole text while leaving the < and > intact.

example

<p>Give me 100.000 €!</p>

must become:

<p>Give me 100.000 &euro;!</p>

the html tags must remain intact

Upvotes: 3

Views: 3214

Answers (5)

David Kirkland
David Kirkland

Reputation: 2461

As others have suggested, this can be achieved with HtmlAgilityPack.

 public static class HtmlTextEncoder
 {
    public static string HtmlEncode(string html)
    {
        if (html == null) return null;

        var doc = new HtmlDocument();
        doc.LoadHtml(html);

        EncodeNode(doc.DocumentNode);

        doc.OptionWriteEmptyNodes = true;
        using (var s = new MemoryStream())
        {
            doc.Save(s);
            var encoded = doc.Encoding.GetString(s.ToArray());
            return encoded;
        }
    }

    private static void EncodeNode(HtmlNode node)
    {
        if (node.HasChildNodes)
        {
            foreach (var childNode in node.ChildNodes)
            {
                if (childNode.NodeType == HtmlNodeType.Text)
                {
                    childNode.InnerHtml = HttpUtility.HtmlEncode(childNode.InnerHtml);
                }
                else
                {
                    EncodeNode(childNode);
                }
            }
        }
        else if (node.NodeType == HtmlNodeType.Text)
        {
            node.InnerHtml = HttpUtility.HtmlEncode(node.InnerHtml);
        }
    }
}

This iterates through all the nodes in the HTML, and replaces any text nodes with HTML encoded text.

I've created a .NET fiddle to demonstrate this technique.

Upvotes: 0

Guffa
Guffa

Reputation: 700212

Use a regular expression that matches either a tag or what's between tags, and encode what's between:

html = Regex.Replace(
  html,
  "(<[^>]+>|[^<]+)",
  m => m.Value.StartsWith("<") ? m.Value : HttpUtility.HtmlEncode(m.Value)
);

Upvotes: 4

Ben Hoffman
Ben Hoffman

Reputation: 8259

You could use HtmlTextWriter in addition to htmlencode. So you would use HtmlTextWriter to setup your <p></p> and then just set the body of the <p></p> using HtmlEncode. HtmlTextWriter allow ToString(); and a bunch of other methods so it shouldn't be much more code.

Upvotes: 0

user57508
user57508

Reputation:

you might go for Html Agility Pack and then encode the values of the tags

Upvotes: 2

Stefan
Stefan

Reputation: 11509

Maybe use string.replace for just those characters you want to encode?

Upvotes: 1

Related Questions