resu
resu

Reputation: 1144

Regex for matching single-quoted strings fails with PHP

So I have this regex:

/'((?:[^\\']|\\.)*)'/

It is supposed to match single-quoted strings while ignoring internal, escaped single quotes \'

It works here, but when executed with PHP, I get different results. Why is that?

Upvotes: 1

Views: 252

Answers (2)

wp78de
wp78de

Reputation: 18950

This is kinda escaping hell. Despite the fact that there's already an accepted answer, the original pattern is actually better. Why? It allows escaping the escape character using the Unrolling the loop technique described by Jeffery Friedl in "Mastering Regular Expressions": "([^\\"]*(?:\\.[^\\"]*)*)" (adapted for single quotes)

Demo

Unrolling the Loop (using double quotes)

"                              # the start delimiter
 ([^\\"]*                      # anything but the end of the string or the escape char
         (?:\\.                #     the escape char preceding an escaped char (any char)
               [^\\"]*         #     anything but the end of the string or the escape char
                      )*)      #     repeat
                             " # the end delimiter

This does not resolve the escaping hell but you have been covered here as well:

Sample Code:

$re = '/\'([^\\\\\']*(?:\\\\.[^\\\\\']*)*)\'/';
$str = '\'foo\', \'can\\\'t\', \'bar\'
\'foo\', \' \\\'cannott\\\'\\\\\', \'bar\'
';

preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
var_dump($matches);

Upvotes: 3

Erayd
Erayd

Reputation: 168

This might be easier using negative lookbehind. Note also that you need to escape the slashes twice - once to tell PHP that you want a literal backslash, and then again to tell the regex engine that you want a literal backslash.

Note also that your capturing expression (.*) is greedy - it will capture everything between ' characters, including other ' characters, whether they are escaped or not. If you want it to stop after the first unescaped ', use .*? instead. I have used the non-greedy version in my example below.

<?php

$test = "This is a 'test \' string' for regex selection";
$pattern = "/(?<!\\\\)'(.*?)(?<!\\\\)'/";

echo "Test data: $test\n";
echo "Pattern:   $pattern\n";

if (preg_match($pattern, $test, $matches)) {
    echo "Matches:\n";
    var_dump($matches);
}

Upvotes: 3

Related Questions