connec
connec

Reputation: 7421

Regular Expression to check typical 'string' (type) syntax

I'm trying to manufacture a regular expression that will capture a valid, arbitrary string (as you might type it) from languages like Ruby and PHP, e.g.:

"lol"      // valid
'was'      // valid
"\"say\""  // valid
'\'what\'' // valid
"m"y"      // invalid
'ba'd'     // invalid
"th\\"is"  // invalid
'su\\'cks' // invalid

I'm a little stuck trying to match the escaped quotes in the content correctly whilst failing on double-escape-then-quote.

Any help appreciated!

Upvotes: 0

Views: 243

Answers (1)

Bart Kiers
Bart Kiers

Reputation: 170158

This matches your first 4 lines and rejects the last 4:

^(["'])(\\.|(?!\\|\1).)*\1$

A quick explanation:

^               # the start of the input
(["'])          # match a single- or double quote and store it in group 1
(               # open group 2
  \\.           #   a backslash followed by any char
  |             #   OR
  (?!\\|\1).    #   if no backslash or the quote matched in group 1 can be seen ahead, match any char
)*              # close group 2 and repeat it zero or more times
\1              # the same quote as matched in group 1
$               # the end of the input

Here's a little PHP demo:

<?php
$tests = array(
    '"lol"',
    "'was'",
    '"\\"say\\""',
    "'\\'what\\''",
    '"m"y"',
    "'ba'd'",
    '"th\\\\"is"',
    "'su\\\\'cks'"
);
foreach($tests as $test) {
  if(preg_match('/^(["\'])(\\\\.|(?!\\\\|\1).)*\1$/', $test)) {
    echo "valid   : " . $test . "\n";
  }
  else {
    echo "invalid : " . $test . "\n";
  }
}
?>

which produces:

valid   : "lol"
valid   : 'was'
valid   : "\"say\""
valid   : '\'what\''
invalid : "m"y"
invalid : 'ba'd'
invalid : "th\\"is"
invalid : 'su\\'cks'

as can be seen on ideone: http://ideone.com/60mtE

Upvotes: 4

Related Questions