Hamid Sarfraz
Hamid Sarfraz

Reputation: 1135

preg_match_all( ) behaves differently on different servers

The below code works perfect on XAMPP on my PC, but does not work on my newly bought VPS. It crashed my code.

preg_match_all( "/$regex/siU" , $string , $matches , PREG_SET_ORDER );

This is expected to simply fetch links and titles from HTML.

Previously, a similar regex problem occurred today. Code was running fine on local server, but creating "Connection Was Reset" error on vps. The problem was caused by some commented html (having php code inside it) that was removed using the below code to optimize output, but even the problem of connection reset is resolved, HTML still has comments in browser source.

$string = preg_replace( '/<!--(.|\s)*?-->/' , ''    , $string );

So, problem is clear. These regex functions are not working fine. But i do not know the solution.

Can anyony help me in solving this.

Solved:

Thanks to https://stackoverflow.com/a/12761686/369005 @vimishor

Upvotes: 4

Views: 1320

Answers (5)

Alan Moore
Alan Moore

Reputation: 75242

So the root problem is that the code that's supposed to remove HTML comments isn't working? That's probably because the regex that's supposed to match the comments uses (.|\s)* to work around the fact that . doesn't match newlines. That's almost guaranteed to cause problems, as this answer explains.

The correct way to match anything-including-newlines is to use the s modifier. For example:

'/<!--.*?-->/s'

That turns on single-line mode (also known as DOTALL mode), which allows the . to match newlines. (The author of that other question had to use [\S\s] instead, because JavaScript has no equivalent for single-line/DOTALL mode.)

Upvotes: 1

jeroen
jeroen

Reputation: 91762

It seems the problem is you are misunderstanding what html comments do. According to your comment below your question, the problem is that html comments were not removed, causing php to run with the wrong parameters.

However, html comments have no influence on php code that is or is not run, only on what the browser displays (and runs in case of javascript). Your php code is run before the output gets to the browser.

If you want to comment php code out, you will need to put in in a /* */ block or start each line with //.

Upvotes: 0

Ja͢ck
Ja͢ck

Reputation: 173642

Let me stop you there for a second. Parsing HTML with regular expressions is a bad idea, unless it's a very isolated issue on a malformed document. You will want to use a proper parser; for instance, here's an example that strips HTML comments:

$html = <<<EOM
<html>
<body>
<div id="test">
<!--
comment here
-->
</div>
</body>
</html>
EOM;

$d = new DOMDocument;
$d->loadHTML($html);

$x = new DOMXPath($d);

foreach ($x->query('//comment()') as $node) {
        $node->parentNode->removeChild($node);
}

echo $d->saveHTML();

Upvotes: 1

Alexandru Guzinschi
Alexandru Guzinschi

Reputation: 5726

Is known the fact that PCRE has sometimes a few problems with text larger than 200 lines. Developers from Drupal and GeSHi were hit by this problem in the past.

References:

  1. Drupal PCRE Issue @ March 23, 2012
  2. GeSHi PCRE Issue @ February 02, 2012

Maybe if you can split the text into small chunks (100 lines for example) and run regex on each chunk, may help.

Upvotes: 2

Markus Poerschke
Markus Poerschke

Reputation: 97

Try this:

$string = preg_replace( '/.*<!--(.|\s)*?-->.*/' , ''    , $string );

Some regex implementations will execute your regular expression like this: /^<!--(.|\s)*?-->$/. So your expression may behave different on different servers.

Upvotes: -1

Related Questions