Reputation: 6441
/^"((?:[^"]|\\.)*)"/
Against this string:
"quote\_with\\escaped\"characters" more
It only matches until the \"
, although I've clearly defined \
as an escape character (and it matches \_
and \\
fine...).
Upvotes: 2
Views: 397
Reputation: 2065
Not intend to confuse, just another information I've played around with. Below regexp(PCRE) try to not match wrong syntax (eg. end with \") and can use with both ' or "
/('|").*\\\1.*?[^\\]\1/
to use with php
<?php if (preg_match('/(\'|").*\\\\\1.*?[^\\\\]\1/', $subject)) return true; ?>
For:
"quote\_with\\escaped\"characters" "aaa"
'just \'another\' quote "example\"'
"Wrong syntax \"
"No escapes, no match here"
This only match:
"quote\_with\\escaped\"characters" and
'just \'another\' quote "example\"'
Upvotes: 0
Reputation: 881695
Using Python with raw-string literals to ensure no further interpretation of escape sequences is taking place, the following variant does work:
import re
x = re.compile(r'^"((?:[^"\\]|\\.)*)"')
s = r'"quote\_with\\escaped\"characters" more"'
mo = x.match(s)
print mo.group()
emits "quote\_with\\escaped\"characters"
; I believe that in your version (which also interrupts the match precociously if substituted in here) the "not a doublequote" subexpression ([^"]
) is swallowing the backslashes that you intend to be taken as escaping the immediately-following characters. All I'm doing here is ensuring that such backslashes are NOT swallowed in this way, and, as I said, it seems to work with this change.
Upvotes: 0
Reputation: 37803
It works correctly if you flip the order of your two alternatives:
/^"((?:\\.|[^"])*)"/
The problem is that otherwise the important \
character gets eaten up before it tries matching \"
. It worked before for \\
and \_
only because both characters in either pair get matched by your [^"]
.
Upvotes: 4