Reputation: 3085
How would one write a regex that matches a pattern that can contain quotes, but if it does, must have matching quotes at the beginning and end?
"?(pattern)"?
Will not work because it will allow patterns that begin with a quote but don't end with one.
"(pattern)"|(pattern)
Will work, but is repetitive. Is there a better way to do that without repeating the pattern?
Upvotes: 21
Views: 14584
Reputation: 867
For regexp that supports the ?P<name>
named group syntax, the following Python re
example illustrates how it can be used to ensure the closing quote matches an optional opening quote:
re.compile(r"""^
(?P<quote>['"]?) # optional quote
(?P<sha1>([0-9-a-f]{40}))\s+ # SHA-1 of commit-id
# ...
(?P=quote)$""", # match opening quote if it exists
re.VERBOSE)
Note the ?P=quote
at the end which references back to the named pattern at the beginning. This ensures that blah
, 'blah'
, and "blah"
all match but 'blah"
does not.
The same can be done in Perl but it's support for (?P=NAME)
does not permit a single-quote to be used as a delimiter. So
echo \"blah\" | perl -ne 'm/(?P<quote>["x]?)(\w+)(?P=quote)/ and print $2;'
works but to support the single-quote, one must use \k{NAME}
to back-reference instead. c.f. man perlre(1).
Upvotes: 0
Reputation: 36
Generally @Daniel Vandersluis response would work. However, some compilers do not recognize the optional group (") if it is empty, therefore they do not detect the back reference \1.
In order to avoid this problem a more robust solution would be:
/^("|)(pattern)\1$/
Then the compiler will always detect the first group. This expression can also be modified if there is some prefix in the expression and you want to capture it first:
/^(key)=("|)(value)\2$/
Upvotes: 0
Reputation: 15326
This is quite simple as well: (".+"|.+)
. Make sure the first match is with quotes and the second without.
Upvotes: 2
Reputation: 94284
You can get a solution without repeating by making use of backreferences and conditionals:
/^(")?(pattern)(?(1)\1|)$/
Matches:
Doesn't match:
This pattern is somewhat complex, however. It first looks for an optional quote, and puts it into backreference 1 if one is found. Then it searches for your pattern. Then it uses conditional syntax to say "if backreference 1 is found again, match it, otherwise match nothing". The whole pattern is anchored (which means that it needs to appear by itself on a line) so that unmatched quotes won't be captured (otherwise the pattern
in pattern"
would match).
Note that support for conditionals varies by engine and the more verbose but repetitive expressions will be more widely supported (and likely easier to understand).
Update: A much simpler version of this regex would be /^(")?(pattern)\1$/
, which does not need a conditional. When I was testing this initially, the tester I was using gave me a false negative, which lead me to discount it (oops!).
I'll leave the solution with the conditional up for posterity and interest, but this is a simpler version that is more likely to work in a wider variety of engines (backreferences are the only feature being used here which might be unsupported).
Upvotes: 29
Reputation: 15204
This should work with recursive regex (which needs longer to get right). In the meantime: in Perl, you can build a self-modifying regex. I'll leave that as an academic example ;-)
my @stuff = ( '"pattern"', 'pattern', 'pattern"', '"pattern' );
foreach (@stuff) {
print "$_ OK\n" if /^
(")?
\w+
(??{defined $1 ? '"' : ''})
$
/x
}
Result:
"pattern" OK
pattern OK
Upvotes: 0
Reputation: 15073
Depending on the language you're using, you should be able to use backreferences. Something like this, say:
(["'])(pattern)\1|^(pattern)$
That way, you're requiring that either there are no quotes, or that the SAME quote is used on both ends.
Upvotes: 1