Georges Oates Larsen
Georges Oates Larsen

Reputation: 7102

Fixed-length regex lookbehind complains of variable-length lookbehind

Here is the code I am trying to run:

$str = 'a,b,c,d';
return preg_split('/(?<![^\\\\][\\\\]),/', $str);

As you can see, the regexp being used here is:

/(?<![^\\][\\]),/

Which is a simple fixed-length negative lookbehind for "preceded by something that isn't a backslash, then something that is!".

This regex works just fine on http://www.phpliveregex.com

But when I go and actually attempt to run the above code, I am spat back the error:

Warning:  preg_split() [function.preg-split]: Compilation failed: lookbehind assertion is not fixed length at offset 13

To make matters worse, a fellow programmer tested the code on his 5.4.24 PHP server, and it worked fine.

This leads me to believe that my issues are related to the configuration of my server, which I have very little control over. I am told that my PHP version if 5.2.*

Are there any workarounds/alternatives to preg_replace() that might not have this issue?

Upvotes: 6

Views: 3757

Answers (3)

raina77ow
raina77ow

Reputation: 106443

The problem is caused by the bug fixed in PCRE 6.7. Quoting the changelog:

A negated single-character class was not being recognized as fixed-length in lookbehind assertions such as (?<=[^f]), leading to an incorrect compile error "lookbehind assertion is not fixed length"

PCRE 6.7 was introduced in PHP 5.2.0, in Nov 2006. As you still have this bug, it means it's not still there at your server - so for a preg-split based workaround you have to use a pattern without a negative character class. For example:

$patt = '/(?<!(?<!\\\\)\\\\),/';
// or...
$patt = '/(?<![\x00-\x5b\x5d-\xFF]\x5c),/';

However, I find the whole approach a bit weird: what if , symbol is preceded by exactly three backslashes? Or five? Or any odd number of them? The comma in this case should be considered 'escaped', but obviously you cannot create a lookbehind expression of variable length to cover these cases.

On the second thought, one can use preg_match_all instead, with a common alternation trick to cover the escaped symbols:

$str = 'e ,a\\,b\\\\,c\\\\\\,d\\\\';
preg_match_all('/(?:[^\\\\,]|\\\\(?:.|$))+/', $str, $matches);
var_dump($matches[0]);

Demo.

I really think I covered all the issues here, those trailing slashes were a killer )

Upvotes: 3

Casimir et Hippolyte
Casimir et Hippolyte

Reputation: 89584

Way to avoid the negated character class (I write \x5c instead of a lot of backslashes to be more clear)

$result = preg_split('/(?<!(?!\x5c).\x5c),/s', $str);

About the approach itself:

If you are trying to split on comma that are not escaped, you are in the wrong way with a lookbehind since you can't check and undefined number of backslash before the comma. You have several possibilities to solve this problem:

$result = preg_split('/(?:[^\x5c]|\A)(?:\x5c.)*\K,/s', $str);

or

$result = preg_split('/(?<!\x5c)(?:\x5c.)*\K,/s', $str);

or for PHP > 5.2.4

$result = preg_split('/\x5c{2}(*SKIP)(?!)|(?<!\x5c),/s', $str);

Upvotes: 1

Federico Piazza
Federico Piazza

Reputation: 31045

I think you are using an older php version since I your error rises on PHP 5.1.6 or lower.

You can check a non working demo here

enter image description here

On the other hand it works for PHP 5.2.16 or higher:

Working demo

enter image description here

Upvotes: 0

Related Questions