Reputation: 41
I am trying to match quoted strings within a piece of text and allowing for escaped quotes within it as well. I tried this regular expression in an online tester, and it works perfectly. However, when I try it in preg_match_all, it fails at the first escaped string.
Here is the code:
$parStr = 'title="My Little Website"
year="2007"
description="Basic website with ..."
tech="PHP, mySQL"
link="<a href=\"http://test.com\">test.com</a>"
';
$matches = array();
preg_match_all('/(\w+)\s*=\s*"(([^\\"]*(\\.)?)*)"/', $parStr, $matches, PREG_SET_ORDER); // Match[4][0] is 'link="<a href=\"'
It fails on the last match, only matching up until the first escaped quote. When I try this expression at http://www.regexplanet.com/simple/index.html, it works perfectly.
The pertinent part of the regex is:
"(([^\\"]*(\\.)?)*)"
Which should eat all text leading up to an escaped quote or quote, followed by eating an optional escaped quote, of which process is repeated 0 or more times, until a non-escaped quote is found, in which the match is complete.
Why will this not work in php? Why does it not work and how should it be fixed?
Upvotes: 2
Views: 1172
Reputation: 41
I do not know why it doesn't work for one particular version of php, but using the idea of a non-greedy match, I came up with this string which does work:
"(.*?[^\\\])"
It non-greedily matches everything until it encounters a dbl-quote that is not preceded by an escape char. For some peculiar reason, three backslashes are needed or php complains of an unmatched bracket. I am thinking that its presence requires a backslash to precede the bracket, but I am not sure. Can anyone confirm why three backslashes are needed?
/edit whitespace limit
Upvotes: 2
Reputation: 123791
How about like this?
preg_match_all('/(\w+)\s*=\s*"((?:.*?\"?)*)"/', $parStr, $matches, PREG_SET_ORDER);
Its give me like this
[1] => link
[2] => <a href=\"http://test.com\">test.com</a>
Inside [], everything consider as single char,
for [^\\"]
, it does not mean EXCEPT \"
, its mean EXCEPT \
AND EXCEPT "
UPDATE for Multiple Value in Same Line
preg_match_all('/(\w+)\s*=\s*"((?:[^\\\]*?(?:\\\")?)*?)"/', $parStr, $matches, PREG_SET_ORDER);
Source String, Sample
$parStr = 'title="My Little Website" year="2007" description="Basic website with ..." tech="PHP, mySQL" tech="PHP, mySQL" link="http://test.com\">test.com" link="http://test.com\">test.com" tech="PHP, mySQL" ';
Matches,
Array
(
[0] => Array
(
[0] => title="My Little Website"
[1] => title
[2] => My Little Website
)
[1] => Array
(
[0] => year="2007"
[1] => year
[2] => 2007
)
[2] => Array
(
[0] => description="Basic website with ..."
[1] => description
[2] => Basic website with ...
)
[3] => Array
(
[0] => tech="PHP, mySQL"
[1] => tech
[2] => PHP, mySQL
)
[4] => Array
(
[0] => tech="PHP, mySQL"
[1] => tech
[2] => PHP, mySQL
)
[5] => Array
(
[0] => link="<a href=\"http://test.com\">test.com</a>"
[1] => link
[2] => <a href=\"http://test.com\">test.com</a>
)
[6] => Array
(
[0] => link="<a href=\"http://test.com\">test.com</a>"
[1] => link
[2] => <a href=\"http://test.com\">test.com</a>
)
[7] => Array
(
[0] => tech="PHP, mySQL"
[1] => tech
[2] => PHP, mySQL
)
)
Personally, I feels like parsing HTML with regex, not really liked, but I don't see any other option to suggest you, so Its just a quick and dirty way. For big project or big files, I suggest you to find a real parser for that.
Upvotes: 0
Reputation: 57774
I tried it on Linux Fedora PHP 5.2.6 and it seems to work fine. The output is:
[wally@zf ~]$ php -f z.php
title="My Little Website"
year="2007"
description="Basic website with ..."
tech="PHP, mySQL"
link="<a href=\"http://test.com\">test.com</a>"
Upvotes: 0