Reputation: 282825
I wrote this regex to match strings:
(?>(?<Quote>""|').*?(?<!\\)\k<Quote>)
i.e., some text enclosed in quotes. It also supports escaping, so it will match "hello\"world"
in its entirety without stopping at the first quote, like I want. But I forgot about double-escaping. "hello\\"world"
is not valid, for example.
I'm pretty sure this is possible to fix with balancing groups, but I've never really used them before. Anyone know how to write this?
Upvotes: 1
Views: 188
Reputation: 210392
Regular expressions are not meant to be used for escaped constructs.
I don't think it's possible to do this in any "nice" kind of way (if at all), although I'll post an edit if I figure out otherwise.
Balancing group definitions are for nested constructs. Nesting doesn't happen in strings, so balancing group definitions don't seem to even be the right tool for this.
It depends on how many features you're looking for. If you simply want to match the next escaped quotation, you can use the pattern
^"([^\\\"]|\\.)*"
which, when escaped for code, turns out like
"^\"([^\\\\\\\"]|\\\\.)*\""
to match something like
"Hello! \" Hi! \" "
but as soon as you start adding more complicated requirements like Unicode escapes, it becomes a lot more tedious. Just do it by hand, it should be much simpler.
If you're curious about how balancing group definitions work anyway, I recommend reading page 430 of this book (34 in pdf).
Upvotes: 1