Reputation: 117
I'm creating a chat widget for my web site. The users will be able to input straight text - no html.
In an effort to eliminate HTML tags AND to allow users to use "<" and ">", I am taking their input and sanitizing it using strip_tags() on the input and htmlentities() on the output to the users' screens --- using php. One problem is that if a user inputs "Russia<China" strip_tags() will greedily eliminate everything after the "<".
My question is ... if I use regex to create a space between a "<" and the next non-space character, will that help me eliminate the threat of XSS? Will it prevent a potential HTML tag to render on the user's screen?
Say, if something like this slips through:
< script type='text/javascript'>alert('some malicious code');< /script>
One advantage in creating that space (e.g. < script... >) seems to be that strip_tags() will leave the "<" alone.
Thanks for any suggestions.
Upvotes: 2
Views: 1843
Reputation: 28656
The added space is enough to stop tags from being stripped by strip_tags
, and from being rendered as HTML by browsers.
But at what point exactly would you use such a regular expression? If you add it after you've done strip_tags()
, legitimate text will already have been stripped. If you add it before strip_tags()
, there won't be any tags left to strip, so users will see the spaced HTML tags in text.
But if they're going to see (mangled) tags anyway, why are you doing this at all? You can just do htmlspecialchars()
, which you have to do anyway.
Even a HTML parser isn't going to help you, because a HTML parser would consider the <China
in your example a tag too.
And is the person typing a<b
making a comparison, talking about HTML, trying to add emphasis, or trying to inject a malicious script?
Upvotes: 4
Reputation: 50041
Just use htmlspecialchars(). It's the only function you need for sanitizing HTML. XSS threats are obliterated provided you use it judiciously. Follow that with nl2br if you want to display multiple lines, otherwise the text will appear on one line.
strip_tags is never, ever, ever the right function for sanitizing HTML. At best, it will eat or mangle certain valid text. At worst, if the allowed_tags parameter is used, it won't sanitize anything because attributes are kept. It also doesn't prevent HTML entities.
Upvotes: 2