Reputation: 3514
I have a security function which is part of a script. It's supposed to filter out malicious code from being executed in an input form. It works without a problem with normal characters from A-Z, but it rejects inputs with characters such as á, ñ, ö, etc.
What can I do so that form inputs with these characters are not rejected? Here is the function:
function add_special_chars($string, $no_quotes = FALSE)
{
$patterns = array(
"/(?i)javascript:.+>/",
"/(?i)vbscript:.+>/",
"/(?i)<img.+onload.+>/",
"/(?i)<body.+onload.+>/",
"/(?i)<layer.+src.+>/",
"/(?i)<meta.+>/",
"/(?i)<style.+import.+>/",
"/(?i)<style.+url.+>/"
);
$string = str_ireplace("&","&",$string);
if (!$no_quotes) $string = str_ireplace("'","'",$string);
$string = str_ireplace('"','"',$string);
$string = str_ireplace('<','<',$string);
$string = str_ireplace('>','>',$string);
$string = str_ireplace(' ',' ',$string);
foreach ($patterns as $pattern)
{
if(preg_match($pattern, $string))
{
$string = strip_tags($string);
}
}
$string = preg_replace('#(&\#*\w+)[\x00-\x20]+;#u', "$1;", $string);
$string = preg_replace('#(&\#x*)([0-9A-F]+);*#iu', "$1$2;", $string);
$string = html_entity_decode($string, ENT_COMPAT, LANG_CODEPAGE);
$string = preg_replace('#(<[^>]+[\x00-\x20\"\'\/])(on|xmlns)[^>]*>#iUu', "$1>", $string);
$string = preg_replace('#([a-z]*)[\x00-\x20\/]*=[\x00-\x20\/]*([\`\'\"]*)[\x00-\x20\/]*j[\x00-\x20]*a[\x00-\x20]*v[\x00-\x20]*a[\x00-\x20]*s[\x00-\x20]*c[\x00-\x20]*r[\x00-\x20]*i[\x00-\x20]*p[\x00-\x20]*t[\x00-\x20]*:#iUu', '$1=$2nojavascript...', $string);
$string = preg_replace('#([a-z]*)[\x00-\x20\/]*=[\x00-\x20\/]*([\`\'\"]*)[\x00-\x20\/]*v[\x00-\x20]*b[\x00-\x20]*s[\x00-\x20]*c[\x00-\x20]*r[\x00-\x20]*i[\x00-\x20]*p[\x00-\x20]*t[\x00-\x20]*:#iUu', '$1=$2novbscript...', $string);
$string = preg_replace('#([a-z]*)[\x00-\x20\/]*=[\x00-\x20\/]*([\`\'\"]*)[\x00-\x20\/]*-moz-binding[\x00-\x20]*:#Uu', '$1=$2nomozbinding...', $string);
$string = preg_replace('#([a-z]*)[\x00-\x20\/]*=[\x00-\x20\/]*([\`\'\"]*)[\x00-\x20\/]*data[\x00-\x20]*:#Uu', '$1=$2nodata...', $string);
$string = preg_replace('#(<[^>]+[\x00-\x20\"\'\/])style[^>]*>#iUu', "$1>", $string);
$string = preg_replace('#</*\w+:\w[^>]*>#i', "", $string);
do
{
$original_string = $string;
$string = preg_replace('#</*(applet|meta|xml|blink|link|embed|object|iframe|frame|frameset|ilayer|layer|bgsound|title|base)[^>]*>#i', "", $string);
}
while ($original_string != $string);
return $string;
}
UPDATE: I found that the following line seems to be causing the problem, but not sure why:
$string = preg_replace('#(<[^>]+[\x00-\x20\"\'\/])style[^>]*>#iUu', "$1>", $string);
Upvotes: 1
Views: 1423
Reputation: 67019
This is a bad idea. The worst part of your function is the htmlentity_decode()
half way though, which undermines the first 1/2 of this function entirely. The attacker can just encode the quote marks and brackets, and you'll just build the payload for the attacker. strip_tags()
is a joke, and is not a good way to protect against XSS. The main problem with this function is that it is far too simple. HTMLPurifer is made up of thousands of regular expressions and it does a much better job, but it isn't perfect.
You are hardly addressing the most common forms of XSS. XSS is an output problem, you can't expect to pass all input though some magical function and assume its safe. XSS depends on how it is used.
Without actually running your code i think something like this would bypass it:
<a href='javA%3bS%3bcript:%3balert(1)'>so very broken</a>
or maybe even something more simplistic:
<img src=x onerror=alert(1) />
Like I said this is a gross oversimplification of a extremely complex problem.
Upvotes: 4