Reputation: 85715
I want to check the HTML tags the user is using in a rich html editor I have. I am not sure though how to do this in C#.
Should I be using Regex and what HTML tags should I be blacking listing/white listing?
Upvotes: 2
Views: 7138
Reputation: 1892
string StringWhitelist(string StringToSanitize, string AllowedCharacters)
{
if (StringToSanitize.Length != 0 && AllowedCharacters.Length != 0)
{
List<char> UnsanitizedString = StringToSanitize.ToCharArray().ToList();
List<char> Whitelist = AllowedCharacters.ToCharArray().ToList();
string SanitizedString = StringToSanitize;
for (int i = 0; i < UnsanitizedString.Count; i++)
SanitizedString = Whitelist.IndexOf(UnsanitizedString[i]) == -1 ? SanitizedString.Replace(UnsanitizedString[i].ToString(), string.Empty) : SanitizedString;
return SanitizedString;
}
else
return null;
}
string StringBlacklist(string StringToSanitize, string NotAllowedCharacters)
{
if (StringToSanitize.Length != 0 && NotAllowedCharacters.Length != 0)
{
List<char> UnsanitizedString = StringToSanitize.ToCharArray().ToList();
List<char> Blacklist = NotAllowedCharacters.ToCharArray().ToList();
string SanitizedString = StringToSanitize;
for (int i = 0; i < UnsanitizedString.Count; i++)
SanitizedString = Blacklist.IndexOf(UnsanitizedString[i]) != -1 ? SanitizedString.Replace(UnsanitizedString[i].ToString(), string.Empty) : SanitizedString;
return SanitizedString;
}
else
return null;
}
Usage:
StringWhitelist("Ciao", "abcdefghjklmnopqrstuvwxyz"); // Output: ao (because "C" and "i" are not in the whitelist)
StringBlacklist("Ciao", "Ci"); // Output: ao (because "C" and "i" are in the blacklist)
Upvotes: 0
Reputation: 217233
Assuming the tags are entered as single string like here on StackOverflow, you'll want to split the string into individual tags first:
string[] tags = "c# html lolcat ".Split(
new char[] { ' ' }, StringSplitOptions.RemoveEmptyEntries);
A white-/blacklist can be represented using a HashSet<T>
storing the tags:
HashSet<string> blacklist = new HashSet<string>(
StringComparer.CurrentCultureIgnoreCase) { "lolcat", "lolrus" };
Then you'd have to check if one of the tags
is on the list:
bool invalid = tags.Any(blacklist.Contains);
Upvotes: 0
Reputation: 217233
A simple whitelisting approach:
string input = "<span><b>99</b> < <i>100</i></span> <!-- 99 < 100 -->";
// escape & < and >
input = input.Replace("&", "&").Replace(">", ">").Replace("<", "<");
// unescape whitelisted tags
string output = input.Replace("<b>", "<b>").Replace("</b>", "</b>")
.Replace("<i>", "<i>").Replace("</i>", "</i>");
Output:
<span><b>99</b> < <i>100</i></span> <!-- 99 < 100 -->
Rendered output:
<span>99 < 100</span> <!-- 99 < 100 -->
Upvotes: 1
Reputation: 25523
You might try the Html Agility Pack. I haven't tried it to skip tags, but it could certainly find tags.
Upvotes: 0