Reputation: 1144
So I have this regex:
/'((?:[^\\']|\\.)*)'/
It is supposed to match single-quoted strings while ignoring internal, escaped single quotes \'
It works here, but when executed with PHP, I get different results. Why is that?
Upvotes: 1
Views: 252
Reputation: 18950
This is kinda escaping hell. Despite the fact that there's already an accepted answer, the original pattern is actually better. Why? It allows escaping the escape character using the
Unrolling the loop technique described by Jeffery Friedl in "Mastering Regular Expressions": "([^\\"]*(?:\\.[^\\"]*)*)"
(adapted for single quotes)
Unrolling the Loop (using double quotes)
" # the start delimiter
([^\\"]* # anything but the end of the string or the escape char
(?:\\. # the escape char preceding an escaped char (any char)
[^\\"]* # anything but the end of the string or the escape char
)*) # repeat
" # the end delimiter
This does not resolve the escaping hell but you have been covered here as well:
$re = '/\'([^\\\\\']*(?:\\\\.[^\\\\\']*)*)\'/';
$str = '\'foo\', \'can\\\'t\', \'bar\'
\'foo\', \' \\\'cannott\\\'\\\\\', \'bar\'
';
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
var_dump($matches);
Upvotes: 3
Reputation: 168
This might be easier using negative lookbehind. Note also that you need to escape the slashes twice - once to tell PHP that you want a literal backslash, and then again to tell the regex engine that you want a literal backslash.
Note also that your capturing expression (.*
) is greedy - it will capture everything between '
characters, including other '
characters, whether they are escaped or not. If you want it to stop after the first unescaped '
, use .*?
instead. I have used the non-greedy version in my example below.
<?php
$test = "This is a 'test \' string' for regex selection";
$pattern = "/(?<!\\\\)'(.*?)(?<!\\\\)'/";
echo "Test data: $test\n";
echo "Pattern: $pattern\n";
if (preg_match($pattern, $test, $matches)) {
echo "Matches:\n";
var_dump($matches);
}
Upvotes: 3