InstilledBee
InstilledBee

Reputation: 131

Encode only accented characters in an HTML string

I have the following function that accepts an HTML string, for example "<p>áêö</p>":

public string EncodeString(string input)
{
    // ...
    return System.Net.WebUtility.HtmlEncode(input);
}

I'd like to modify that function to output the same string, but with the accented characters as HTML entities. Using System.Net.WebUtility.HtmlEncode() encodes the entire string, including the HTML tags. I'd like to preserve the HTML tags if possible, since the string is parsed and rendered elsewhere in the application. Is this something that is better solved with a regex?

Upvotes: 0

Views: 891

Answers (2)

greenjaed
greenjaed

Reputation: 621

You can use a library like AngleSharp to replace the content of an html element:

public static async Task<string> EncodeString(string input)
{
    var context = BrowsingContext.New(Configuration.Default);
    var document = await context.OpenAsync(req => req.Content(input));
    var pElement = document.QuerySelector("p");
    pElement.TextContent = System.Net.WebUtility.HtmlEncode(pElement.TextContent);
    return pItem.ToHtml();
}

See it in action here: .NET Fiddle


For more general situations where you have nested elements, here's the adapted code:

public static async Task<string> EncodeString(string input)
{
    var context = BrowsingContext.New(Configuration.Default);
    var document = await context.OpenAsync(req => req.Content(input));
    return await EncodeString(document.Body.FirstChild);
}

private static async Task<string> EncodeString(INode content)
{
    foreach(var node in content.ChildNodes)
    {
        node.NodeValue = node.NodeType == NodeType.Text ?
            System.Net.WebUtility.HtmlEncode(node.NodeValue) :
            await EncodeString(node);
    }
    return content.ToHtml();
}

Upvotes: 1

Gunnarhawk
Gunnarhawk

Reputation: 437

This is quite possibly the oddest solution, but...

public static string EncodeString(string input)
{
    string startTag = input.Substring(0, input.IndexOf(">") + 1);
    string endTag = input.Substring(input.IndexOf("</"), startTag.Length + 1);
    input = input.Substring(startTag.Length, input.Length - endTag.Length - startTag.Length);
    return startTag + System.Net.WebUtility.HtmlEncode(input) + endTag;
}

Upvotes: 1

Related Questions