Francisco Presencia
Francisco Presencia

Reputation: 8860

preg_match() not working in some cases

I feel like this should be such an easy 'change the comma', so I've done my research and tried many different things, but nothing seems to work. First the code I used to attempt to debug it:

/* More code before */

$Test = "This is a test <ul>TEST</ul> Blabla";
$Real = $Data['chapters']['introduction'];
var_dump($Real);
echo "\n\n";

preg_match('/<ul>(.*)<\/ul>/', $Test, $VarTest);
var_dump($VarTest);
echo "\n\n";

preg_match('/<ul>(.*)<\/ul>/', $Real, $VarReal);
var_dump($VarReal);

The result is this:

string(1888) "<p>The <b>theory of relativity</b>, or simply <b>relativity</b>, generally encompasses two theories of <a href="http://en.wikipedia.org/wiki/Albert_Einstein" title="Albert Einstein">Albert Einstein</a>: <a href="http://en.wikipedia.org/wiki/Special_relativity" title="Special relativity">special relativity</a> and <a href="http://en.wikipedia.org/wiki/General_relativity" title="General relativity">general relativity</a>. Concepts introduced by the theories of relativity include:</p>
<ul>
  <li>
    <p>Measurements of various quantities are <i>relative</i> to the velocities of observers. In particular, space and time can <a href="http://en.wikipedia.org/wiki/Time_dilation" title="Time dilation">dilate</a>.</p>
  </li>
  <li>
    <p><a href="http://en.wikipedia.org/wiki/Spacetime" title="Spacetime">Spacetime</a>: space and time should be considered together and in relation to each other.</p>
  </li>
  <li>
    <p>The speed of light is nonetheless invariant, the same for all observers.</p>
  </li>
</ul>
<p>The term &quot;theory of relativity&quot; was based on the expression &quot;relative theory&quot; (<a href="http://en.wikipedia.org/wiki/German_language" title="German language">German</a>: <span lang="de"><i>Relativtheorie</i></span>) used by <a href="http://en.wikipedia.org/wiki/Max_Planck" title="Max Planck">Max Planck</a> in 1906, who emphasized how the theory uses the <a href="http://en.wikipedia.org/wiki/Principle_of_relativity" title="Principle of relativity">principle of relativity</a>. In the discussion section of the same paper <a href="http://en.wikipedia.org/wiki/Alfred_Bucherer" title="Alfred Bucherer">Alfred Bucherer</a> used for the first time the expression &quot;theory of relativity&quot; (<a href="http://en.wikipedia.org/wiki/German_language" title="German language">German</a>: <span lang="de"><i>Relativit&auml;tstheorie</i></span>).</p>
"

array(2) {
  [0]=>
  string(13) "<ul>TEST</ul>"
  [1]=>
  string(4) "TEST"
}


array(0) {
}

Any idea on why the last array is empty (when it should contain the 3 list elements)?

Some more info, it is retrieved from a MySQL using PDO, I've tried escaping it (for the quotes), replacing the quotes, checked that this text size is way below the preg_match() string limit, I just cannot find where the problem is. I think the code speaks for itself about where specifically the problem is, anyway, I'd gladly perform the tests you need. Thanks.

Upvotes: 1

Views: 398

Answers (3)

Spudley
Spudley

Reputation: 168783

The biggest problem you have here is that you are trying to parse HTML code using regex. Even if you can get it to work with the data you have, as soon as the data contains nested <ul> tags, your regex will blow up, and at that point it will become extremely difficult to get it working. Parsing HTML really ought to be done using a DOM parser (ie PHP's DOMDocument class). Regex is the wrong tool for the job.

That said, if you must do it with regex, you need to use the s modifier, due to the input being across multiple lines. This modifier changes the behaviour of the dot character in the regex so that it includes line feed characters.

So your final pattern needs to look like this:

preg_match('/<ul>(.*)<\/ul>/s', $Real, $VarReal);

Hope that helps.

Upvotes: 3

Francisco Presencia
Francisco Presencia

Reputation: 8860

I used the code I had from modifying some SO answers a little bit; But I found the solution by checking some other answers and seeing Patrice Levesque's one. I used 's' to the function call, according to this question:

preg_match('/<ul>(.*)<\/ul>/s', $Real, $VarReal);

Upvotes: 1

Patrice Levesque
Patrice Levesque

Reputation: 2114

Your regex in the second case is multiline. Append “m” to your function call:

preg_match('/<ul>(.*)<\/ul>/m', $Real, $VarReal);

Upvotes: 2

Related Questions