Core Xii
Core Xii

Reputation: 6441

How to match a quoted string with escaped quotes in it?

/^"((?:[^"]|\\.)*)"/

Against this string:

"quote\_with\\escaped\"characters" more

It only matches until the \", although I've clearly defined \ as an escape character (and it matches \_ and \\ fine...).

Upvotes: 2

Views: 397

Answers (3)

noomz
noomz

Reputation: 2065

Not intend to confuse, just another information I've played around with. Below regexp(PCRE) try to not match wrong syntax (eg. end with \") and can use with both ' or "

/('|").*\\\1.*?[^\\]\1/

to use with php

<?php if (preg_match('/(\'|").*\\\\\1.*?[^\\\\]\1/', $subject)) return true; ?>

For:

"quote\_with\\escaped\"characters"  "aaa"
'just \'another\' quote "example\"'
"Wrong syntax \"
"No escapes, no match here"

This only match:

"quote\_with\\escaped\"characters" and
'just \'another\' quote "example\"'

Upvotes: 0

Alex Martelli
Alex Martelli

Reputation: 881695

Using Python with raw-string literals to ensure no further interpretation of escape sequences is taking place, the following variant does work:

import re

x = re.compile(r'^"((?:[^"\\]|\\.)*)"')

s = r'"quote\_with\\escaped\"characters" more"'

mo = x.match(s)
print mo.group()

emits "quote\_with\\escaped\"characters"; I believe that in your version (which also interrupts the match precociously if substituted in here) the "not a doublequote" subexpression ([^"]) is swallowing the backslashes that you intend to be taken as escaping the immediately-following characters. All I'm doing here is ensuring that such backslashes are NOT swallowed in this way, and, as I said, it seems to work with this change.

Upvotes: 0

VoteyDisciple
VoteyDisciple

Reputation: 37803

It works correctly if you flip the order of your two alternatives:

/^"((?:\\.|[^"])*)"/

The problem is that otherwise the important \ character gets eaten up before it tries matching \". It worked before for \\ and \_ only because both characters in either pair get matched by your [^"].

Upvotes: 4

Related Questions