Reputation: 771
We have a string which is readed from web page. Because browsers are tolerant to unencoded special chars (e.g. ampersand), some pages using it encoded, some not... so there is a large possibility, we have stored some data encoded once, and some multiple times...
Is there some clear solution, how to be sure, my string is decoded enough no matter how many times it was encoded?
Here is what we using now:
public static string HtmlDecode(this string input)
{
var temp = HttpUtility.HtmlDecode(input);
while (temp != input)
{
input = temp;
temp = HttpUtility.HtmlDecode(input);
}
return input;
}
and same using with UrlDecode.
Upvotes: 6
Views: 4524
Reputation: 15148
In case this is helpful to anyone, here is a recursive version for multiple HTML encoded strings (I find it a bit easier to read):
public static string HtmlDecode(string input) {
string decodedInput = WebUtility.HtmlDecode(input);
if (input == decodedInput) {
return input;
}
return HtmlDecode(decodedInput);
}
WebUtility
is in the System.Net
namespace.
Upvotes: 1
Reputation: 1188
Your code seems to be decoding html strings correctly, with multiple checks.
However, if the input HTML is malformed, i.e not encoded properly, the decoding will be unexpected. i.e bad inputs might not be decoded properly no matter how many times it passes through this method.
A quick check with two encoded strings, one with completely encoded string, and another with partially encoded yielded the following results.
"<b>"
will decode to "<b>"
"<b>
will decode to "<b>"
Upvotes: 1
Reputation: 34802
That's probably the best approach honestly. The real solution would be to rework your code so that you only singly encode things in all places, so that you could only singly decode them.
Upvotes: 3