Reputation: 25336
I'm writing a sever-side script that replaces all URLs in a body of text with <a/>
tag versions (so they can be clicked).
How can I make sure that any urls I convert do not contain any XSS style javascripts in them?
I'm currently filtering for "javascript:" in the string, but I feel that is likely not sufficient..
Upvotes: 0
Views: 1867
Reputation: 10981
This was taken from Kohana framework, related to XSS filtering. Not a complete answer, but might get you on the way.
// Remove javascript: and vbscript: protocols
$str = preg_replace('#([a-z]*)[\x00-\x20]*=[\x00-\x20]*([`\'"]*)[\x00-\x20]*j[\x00-\x20]*a[\x00-\x20]*v[\x00-\x20]*a[\x00-\x20]*s[\x00-\x20]*c[\x00-\x20]*r[\x00-\x20]*i[\x00-\x20]*p[\x00-\x20]*t[\x00-\x20]*:#iu', '$1=$2nojavascript...', $str);
$str = preg_replace('#([a-z]*)[\x00-\x20]*=([\'"]*)[\x00-\x20]*v[\x00-\x20]*b[\x00-\x20]*s[\x00-\x20]*c[\x00-\x20]*r[\x00-\x20]*i[\x00-\x20]*p[\x00-\x20]*t[\x00-\x20]*:#iu', '$1=$2novbscript...', $str);
$str = preg_replace('#([a-z]*)[\x00-\x20]*=([\'"]*)[\x00-\x20]*-moz-binding[\x00-\x20]*:#u', '$1=$2nomozbinding...', $str);
// Only works in IE: <span style="width: expression(alert('Ping!'));"></span>
$str = preg_replace('#(<[^>]+?)style[\x00-\x20]*=[\x00-\x20]*[`\'"]*.*?expression[\x00-\x20]*\([^>]*+>#is', '$1>', $str);
$str = preg_replace('#(<[^>]+?)style[\x00-\x20]*=[\x00-\x20]*[`\'"]*.*?behaviour[\x00-\x20]*\([^>]*+>#is', '$1>', $str);
$str = preg_replace('#(<[^>]+?)style[\x00-\x20]*=[\x00-\x20]*[`\'"]*.*?s[\x00-\x20]*c[\x00-\x20]*r[\x00-\x20]*i[\x00-\x20]*p[\x00-\x20]*t[\x00-\x20]*:*[^>]*+>#ius', '$1>', $str);
Upvotes: 0
Reputation: 9728
Any modern server-side language has some sort of implementation of Markdown or other lightweight markup languages. Those markup languages replace URLs with a clickable link.
Unless you have a lot of time to spend to research about this topic and implement this script, I'd suggest to spot the best Markdown implementation in your language and dig its code, or simply use it in your code.
Markdown is usually shipped as a library; some of them let you configure what they have to process and what they have to ignore – in your case you want to process URL, ignoring any other element.
Here's an (incomplete) list of solid Markdown implementations for different languages:
Upvotes: 1
Reputation: 887215
You need to attribute-encode the URLs.
You should also make sure that they start with http://
or https://
.
Upvotes: 0