Reputation: 555
So the regex for a quoted string has been solved over and over. A good answer seen here: https://stackoverflow.com/a/5696141/692331
$re_dq = '/"[^"\\\\]*(?:\\\\.[^"\\\\]*)*"/s';
Seems to be the standard solution for PHP.
My Issue is that my quotes are escaped by another quote. Example:
="123 4556 789 ""Product B v.24"""
="00 00 F0 FF ""Licence key for blah blah"" hfd.34"
=""
The previous strings should match the following, respectively:
string '123 4556 789 ""Product B v.24""' (length=31)
string '00 00 F0 FF ""Licence key for blah blah"" hfd.34' (length=48)
string '' (length=0)
The examples given are just illustrations of what the string may look like and are not the actual strings I will be matching, which can number in the tens of thousands.
I need a regex pattern that will match a double quoted string which may OR MAY NOT contain sequences of two double quotes.
UPDATE 5/5/14:
Upvotes: 0
Views: 121
Reputation: 555
I found that the pattern from zx81
$re_dq_answer = '/="(?:[^"]|"")*"/'
results in backtracking after every single matched character. I found that I could adapt the pattern found at the very top of my question to suit my need.
$re_dq_orignal = '/="[^"\\\\]*(?:\\\\.[^"\\\\]*)*"/s';
becomes
$re_dq_modified = '/="([^"]*(?:""[^"]*)*)"/';
The 's' pattern modifier isn't necessary because the pattern does not using the \s metacharacter.
The longest string I have had to match was 28,000 characters which caused Apache to crash on a stackoverflow. I had to increase the stack size to 32MB (linux default is 8mb, windows is 1mb) just to get by! I didn't want every thread to have this large stack size, so I started looking for a better solution.
Example (tested on Regex101): A string (length=3,200) which required 6,637 steps to match using $re_dq_answer now requires 141 steps using $re_dq_modified. Slight improvement I'd say!
Upvotes: 1
Reputation: 41848
Edit: Per your request, minor mod to account for empty quotes.
(?<!")"(?:[^"]|"")*"
Original solution:
(?<!")"(?:[^"]|"")+"
Demo:
<?php
$string = '
"123 4556 789 ""Product B v.24"""
"00 00 F0 FF ""Licence key for blah blah"" hfd.34"';
$regex='~(?<!")"(?:[^"]|"")+"~';
$count = preg_match_all($regex,$string,$m);
echo $count."<br /><pre>";
print_r($m[0]);
echo "</pre>";
?>
Output:
2
Array
(
[0] => "123 4556 789 ""Product B v.24"""
[1] => "00 00 F0 FF ""Licence key for blah blah"" hfd.34"
)
Upvotes: 1