Jake
Jake

Reputation: 3085

How do I match a pattern with optional surrounding quotes?

How would one write a regex that matches a pattern that can contain quotes, but if it does, must have matching quotes at the beginning and end?

"?(pattern)"?

Will not work because it will allow patterns that begin with a quote but don't end with one.

"(pattern)"|(pattern)

Will work, but is repetitive. Is there a better way to do that without repeating the pattern?

Upvotes: 21

Views: 14584

Answers (6)

Keith Hanlan
Keith Hanlan

Reputation: 867

For regexp that supports the ?P<name> named group syntax, the following Python re example illustrates how it can be used to ensure the closing quote matches an optional opening quote:

 re.compile(r"""^
            (?P<quote>['"]?)                  # optional quote
            (?P<sha1>([0-9-a-f]{40}))\s+      # SHA-1 of commit-id
            # ...
            (?P=quote)$""",                   # match opening quote if it exists
            re.VERBOSE)

Note the ?P=quote at the end which references back to the named pattern at the beginning. This ensures that blah, 'blah', and "blah" all match but 'blah" does not.

The same can be done in Perl but it's support for (?P=NAME) does not permit a single-quote to be used as a delimiter. So

echo \"blah\" | perl -ne 'm/(?P<quote>["x]?)(\w+)(?P=quote)/ and print $2;'

works but to support the single-quote, one must use \k{NAME} to back-reference instead. c.f. man perlre(1).

Upvotes: 0

marbri91
marbri91

Reputation: 36

Generally @Daniel Vandersluis response would work. However, some compilers do not recognize the optional group (") if it is empty, therefore they do not detect the back reference \1.

In order to avoid this problem a more robust solution would be:

/^("|)(pattern)\1$/

Then the compiler will always detect the first group. This expression can also be modified if there is some prefix in the expression and you want to capture it first:

/^(key)=("|)(value)\2$/

Upvotes: 0

Jonas Stensved
Jonas Stensved

Reputation: 15326

This is quite simple as well: (".+"|.+). Make sure the first match is with quotes and the second without.

Upvotes: 2

Daniel Vandersluis
Daniel Vandersluis

Reputation: 94284

You can get a solution without repeating by making use of backreferences and conditionals:

/^(")?(pattern)(?(1)\1|)$/

Matches:

  • pattern
  • "pattern"

Doesn't match:

  • "pattern
  • pattern"

This pattern is somewhat complex, however. It first looks for an optional quote, and puts it into backreference 1 if one is found. Then it searches for your pattern. Then it uses conditional syntax to say "if backreference 1 is found again, match it, otherwise match nothing". The whole pattern is anchored (which means that it needs to appear by itself on a line) so that unmatched quotes won't be captured (otherwise the pattern in pattern" would match).

Note that support for conditionals varies by engine and the more verbose but repetitive expressions will be more widely supported (and likely easier to understand).


Update: A much simpler version of this regex would be /^(")?(pattern)\1$/, which does not need a conditional. When I was testing this initially, the tester I was using gave me a false negative, which lead me to discount it (oops!).

I'll leave the solution with the conditional up for posterity and interest, but this is a simpler version that is more likely to work in a wider variety of engines (backreferences are the only feature being used here which might be unsupported).

Upvotes: 29

rubber boots
rubber boots

Reputation: 15204

This should work with recursive regex (which needs longer to get right). In the meantime: in Perl, you can build a self-modifying regex. I'll leave that as an academic example ;-)

my @stuff = ( '"pattern"', 'pattern', 'pattern"', '"pattern'  );

foreach (@stuff) {
   print "$_ OK\n" if /^
                        (")?
                        \w+
                        (??{defined $1 ? '"' : ''})
                       $
                      /x
}

Result:

"pattern" OK
pattern OK

Upvotes: 0

zigdon
zigdon

Reputation: 15073

Depending on the language you're using, you should be able to use backreferences. Something like this, say:

(["'])(pattern)\1|^(pattern)$

That way, you're requiring that either there are no quotes, or that the SAME quote is used on both ends.

Upvotes: 1

Related Questions