Reputation: 1453
How do I limit the types of HTML that a user can input into a textbox? I'm running a small forum using some custom software that I'm beta testing, but I need to know how to limit the HTML input. Any suggestions?
Upvotes: 1
Views: 1566
Reputation: 1235
PHP comes with a simple function strip_tag to strip HTML tags. It allows for certain tags to not be stripped.
Example #1 strip_tags() example
<?php
$text = '<p>Test paragraph.</p><!-- Comment --> <a href="#fragment">Other text</a>';
echo strip_tags($text);
echo "\n";
// Allow <p> and <a>
echo strip_tags($text, '<p><a>');
?>
The above example will output:
Test paragraph. Other text
<p>Test paragraph.</p> <a href="#fragment">Other text</a>
Personally for a forum, I would use BBCode or Markdown because the amount of support and features provided such as live preview.
Upvotes: 0
Reputation: 63588
Regardless what you use, be sure to be informed of what kind of HTML content can be dangerous.
e.g. a < script > tag is pretty obvious, but a < style > tag is just as bad in IE, because it can invoke JScript commands.
In fact, any style="..." attribute can invoke script in IE.
< object > would be one more tag to be weary of.
Upvotes: 0
Reputation: 84503
i'd suggest a slightly alternative approach:
keeping user data clean allows you more flexibility in how it's displayed. filtering all outgoing data is a good habit to get into (along the never trust data meme).
Upvotes: 2
Reputation: 12476
You didn't state what the forum was built with, but if it's PHP, check out:
Library Features: Whitelist, Removal, Well-formed, Nesting, Attributes, XSS safe, Standards safe
Upvotes: 2
Reputation: 17546
Parse the input provides and strip out all html tags that don't match exactly the list you are allowing. This can either be a complex regex, or you can do a stateful iteration through the char[] of the input string building the allowed input string and stripping unwanted attributes on tags like img
.
Use a different code system (BBCode, Markdown)
Find some code online that already does this, to use as a basis for your implementation. For example Slashcode must perform this, so look for its implementation in the Perl and use the regexes (that I assume are there)
Upvotes: 0
Reputation: 33455
Once the text is submitted, you could strip any/all tags that don't match your predefined set using a regex in PHP.
It would look something like the following:
find open tag (<)
if contents != allowed tag, remove tag (from <..>)
Upvotes: 0