Ryan Steffer
Ryan Steffer

Reputation: 434

PHP RegEx not matching a string that it should match

This is driving me insane...

I have the following code:

    # open pdf
    $pdf = file_get_contents('myfile.pdf');

    echo("RE 1:\n");
    preg_match('/^[0-9]+ 0 obj.*\/Contents \[ ([0-9]+ [0-9]+) R \\]/msU', $pdf, $m);
    var_dump($m);

    echo("\nRE 2:\n");
    preg_match('/^8 0 obj.*\/Contents \[ ([0-9]+ [0-9]+) R \\]/msU', $pdf, $m);
    var_dump($m);

The file myfile.pdf contains the following text:

...
8 0 obj
<<
/Type /Page
/Parent 2 0 R
/Resources 6 0 R
/Contents [ 5 0 R ]
>>
endobj
...

The only difference between those two regular expressions is the numeric range at the beginning of the string. Yet I get the following output:

RE 1:
array(0) {
}

RE 2:
array(2) {
  [0]=>
  string(78) "8 0 obj
<<
/Type /Page
/Parent 2 0 R
/Resources 6 0 R
/Contents [ 5 0 R ]"
  [1]=>
  string(3) "5 0"
}

I would expect both regular expressions to return similar results, but the regular expression with the numeric range at the start (RE 1) doesn't return any results. Is this a bug or am I doing something wrong?

Update

After adding preg_last_error(), I am getting PREG_BACKTRACK_LIMIT_ERROR. How can I fix that?

Upvotes: 1

Views: 471

Answers (1)

Emma
Emma

Reputation: 27723

I'm guessing that you might be designing an expression that would somewhat look like,

[0-9]+\s+0\s+obj\b.*?\/Contents\s+\[\s*([0-9]+\s+[0-9]+)\s+R\s*\]

on s mode.

Test

$re = '/[0-9]+\s+0\s+obj\b.*?\/Contents\s+\[\s*([0-9]+\s+[0-9]+)\s+R\s*\]/s';
$str = '8 0 obj
<<
/Type /Page
/Parent 2 0 R
/Resources 6 0 R
/Contents [ 5 0 R ]
>>
endobj

8 0 obj
<<
/Type /Page
/Parent 2 0 R
/Resources 6 0 R
/Contents [ 5 0 R ]
>>
endobj';

preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);

var_dump($matches);

The expression is explained on the top right panel of regex101.com, if you wish to explore/simplify/modify it, and in this link, you can watch how it would match against some sample inputs, if you like.

RegEx Circuit

jex.im visualizes regular expressions:

enter image description here

Upvotes: 1

Related Questions