Alexander Buch
Alexander Buch

Reputation: 31

preg_replace returns null

So I got this one out of a string-sanitizing/xss-filter class buried somewhere deep inside a module of the system I'm working with.

The client called in and said "my form isn't posting when I put <> signs in it". Tried it, worked. He then send me the specific text, that was to be submitted and presto! I've got the same error.

Rocking my XDebug, I isolated the problem to occur exactly at the aforementioned string-sanitizing class when it's trying to remove typical JavaScript event-handlers from the text by using preg_replace.

The code goes something like $str = preg_replace($prettyLongRegex, $replaceMent, $str); Naturally, I expect $str to just stay as it was if nothing matches. What happens though, is that the whole expression returns NULL, effectively making my $str an empty string.

Boom, the text is gone, the form doesn't validate, the error is being printed…

While fiddling around with the parameters, I realized that another text i.e. $str or less handlers within the text work out just fine. It is this exact constellation which is leading to my problem.

My question: Why does it return NULL?

Here is a simple script which shows the problem by printing NULL:

<?php

$str = <<< EOF
Aim

To compare the clinical evolution of >70% asymtomatic in-stent  restenosis  in superficial femoral artery. (SFA) treated with endovascular procedure or conservative treatment.

Methods

Historical cohort  study was performed in patients with femoral artery stent with an asymptomatic  in-stent restenosis of 70-99%. Two groups: Conservative managment vs endovascular treatment.

In both groups we compared: Limb salvage  and critical ischemia or limited claudication free survival rates ( <250m) , from initial stent  treatment until  the appearance of critical ischemia o major amputation  ( Log-Rank, Kaplan Meier).

Primary permeability and  assisted primary permeability  were analized.


Results

From January 2010-December 2015, twenty three  >70%  in-stent restenosis were diagnosed  in 20 patients  with an average age of 78 years old (SD 9,6). Average follow-up was of  30,1 months. 12 patients (52,2%) received medical  (Aspirin and Statins) and  endovascular treatment.  11 patients (47,8%) received only medical treatment without  an endovascular procedure.

No difference were seen in limb salvage.

Critical ischemia  or limited claudication free survival rates were higher in the conservative treatment group vs  endovascular group.  (P=0,031)

Primary permeability  was 14,2 months ( IC 8,2-20.2). Assisted primary permeability, 65,5 months ( IC 57,4-77,5).

Conclusions

In our experience treatment of  severe asymtomatic lesions of  in-stent stensosis in SFA does not reduce the risk of amputation, critical ischaemia o limitant claudication. In asymptomatic patients, conservative treatment could be a good option, without increasing the risk in amputation and critical ischema.
EOF;

$str = preg_replace('#(<[^>]+.*?)(onabort|onactivate|onafterprint|onafterupdate|onbeforeactivate|onbeforecopy|onbeforecut|onbeforedeactivate|onbeforeeditfocus|onbeforepaste|onbeforeprint|
        onbeforeunload|onbeforeupdate|onblur|onbounce|oncanplay|oncanplaythrough|oncellchange|onchange|
        onclick|oncontextmenu|oncontrolselect|oncopy|oncuechange|oncut|ondataavailable|ondatasetchanged|
        ondatasetcomplete|ondblclick|ondeactivate|ondrag|ondragend|ondragenter|ondragleave|ondragover|
        ondragstart|ondrop|ondurationchange|onemptied|onended|onerror|onerrorupdate|onfilterchange|
        onfinish|onfocus|onfocusin|onfocusout|onformchange|onforminput|onhashchange|onhelp|oninput|oninvalid,|onkeydown|
        onkeypress|onkeyup|onlayoutcomplete|onload|onloadeddata|onloadedmetadata|onloadstart|
        onlosecapture|onmessage|onmousedown|onmouseenter|onmouseleave|onmousemove|onmouseout|
        onmouseover|onmouseup|onmousewheel|onmove|onmoveend|onmovestart|onoffline|ononline|
        onpagehide|onpageshow|onpaste|onpause|onplay|onplaying|onpopstate|onprogress|
        onpropertychange|onratechange|onreadystatechange|onredo|onreset|onresize|onresizeend|
        onresizestart|onrowenter|onrowexit|onrowsdelete|onrowsinserted|onscroll|onseeked|onseeking|
        onselect|onselectionchange|onselectstart|onshow|onstalled|onstart|onstop|onstorage|onsubmit|
        onsuspend|ontimeupdate|onundo|onunload|onvolumechange|onwaiting)[^>]*>#iU',"\\1>",$str);

var_dump($str);

My system is OSX 10.11.4 and

PHP 5.6.17 (cli) (built: Jan  8 2016 10:27:48)
Copyright (c) 1997-2015 The PHP Group
Zend Engine v2.6.0, Copyright (c) 1998-2015 Zend Technologies
with Zend OPcache v7.0.6-dev, Copyright (c) 1999-2015, by Zend Technologies
with Xdebug v2.3.3, Copyright (c) 2002-2015, by Derick Rethans

Upvotes: 1

Views: 683

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626758

See the preg_replace reference:

If matches are found, the new subject will be returned, otherwise subject will be returned unchanged or NULL if an error occurred.

The error is catastrophic backtracking - see your regex demo.

To fix this, you need to either remove the [^>]+ at the start or keep [^>]+ and remove .*? (depending on what you need to match). The point is that [^>]+ and .*? can match at the same location and since the patterns are followed with a huge alternation group (where all alternatives start with the same on substring letting all of them match at the same location), this creates a huge amount of possible variations that the regex engine has to check before admitting there is a failure - leading to catastrophic backtracking.

You can also greatly speed up the pattern if you put on outside the (...):

(<[^>]+)on(abort|activate|afterprint|afterupdate|beforeactivate|beforecopy|beforecut|beforedeactivate|beforeeditfocus|beforepaste|beforeprint|
        onbeforeunload|beforeupdate|blur|bounce|canplay|canplaythrough|cellchange|change|
        onclick|contextmenu|controlselect|copy|cuechange|cut|dataavailable|datasetchanged|
        ondatasetcomplete|dblclick|deactivate|drag|dragend|dragenter|dragleave|dragover|
        ondragstart|drop|durationchange|emptied|ended|error|errorupdate|filterchange|
        onfinish|focus|focusin|focusout|formchange|forminput|hashchange|help|input|invalid,|keydown|
        onkeypress|keyup|layoutcomplete|load|loadeddata|loadedmetadata|loadstart|
        onlosecapture|message|mousedown|mouseenter|mouseleave|mousemove|mouseout|
        onmouseover|mouseup|mousewheel|move|moveend|movestart|offline|online|
        onpagehide|pageshow|paste|pause|play|playing|popstate|progress|
        onpropertychange|ratechange|readystatechange|redo|reset|resize|resizeend|
        onresizestart|rowenter|rowexit|rowsdelete|rowsinserted|scroll|seeked|seeking|
        onselect|selectionchange|selectstart|show|stalled|start|stop|storage|submit|
        onsuspend|timeupdate|undo|unload|volumechange|waiting)[^>]*>

See the regex demo

NOTE: If you need to match up to the first on..., turn (<[^>]+) into a lazy subpattern - (<[^>]+?), and perhaps you also would like to add a word boundary before on: (<[^>]+?)\bon.

Upvotes: 6

Related Questions