Reputation: 7421
I'm trying to manufacture a regular expression that will capture a valid, arbitrary string (as you might type it) from languages like Ruby and PHP, e.g.:
"lol" // valid
'was' // valid
"\"say\"" // valid
'\'what\'' // valid
"m"y" // invalid
'ba'd' // invalid
"th\\"is" // invalid
'su\\'cks' // invalid
I'm a little stuck trying to match the escaped quotes in the content correctly whilst failing on double-escape-then-quote.
Any help appreciated!
Upvotes: 0
Views: 243
Reputation: 170158
This matches your first 4 lines and rejects the last 4:
^(["'])(\\.|(?!\\|\1).)*\1$
A quick explanation:
^ # the start of the input
(["']) # match a single- or double quote and store it in group 1
( # open group 2
\\. # a backslash followed by any char
| # OR
(?!\\|\1). # if no backslash or the quote matched in group 1 can be seen ahead, match any char
)* # close group 2 and repeat it zero or more times
\1 # the same quote as matched in group 1
$ # the end of the input
Here's a little PHP demo:
<?php
$tests = array(
'"lol"',
"'was'",
'"\\"say\\""',
"'\\'what\\''",
'"m"y"',
"'ba'd'",
'"th\\\\"is"',
"'su\\\\'cks'"
);
foreach($tests as $test) {
if(preg_match('/^(["\'])(\\\\.|(?!\\\\|\1).)*\1$/', $test)) {
echo "valid : " . $test . "\n";
}
else {
echo "invalid : " . $test . "\n";
}
}
?>
which produces:
valid : "lol"
valid : 'was'
valid : "\"say\""
valid : '\'what\''
invalid : "m"y"
invalid : 'ba'd'
invalid : "th\\"is"
invalid : 'su\\'cks'
as can be seen on ideone: http://ideone.com/60mtE
Upvotes: 4