mpen
mpen

Reputation: 282825

Regex to match escapable strings?

I wrote this regex to match strings:

(?>(?<Quote>""|').*?(?<!\\)\k<Quote>)

i.e., some text enclosed in quotes. It also supports escaping, so it will match "hello\"world" in its entirety without stopping at the first quote, like I want. But I forgot about double-escaping. "hello\\"world" is not valid, for example.

I'm pretty sure this is possible to fix with balancing groups, but I've never really used them before. Anyone know how to write this?

Upvotes: 1

Views: 188

Answers (1)

user541686
user541686

Reputation: 210392

Regular expressions are not meant to be used for escaped constructs.

I don't think it's possible to do this in any "nice" kind of way (if at all), although I'll post an edit if I figure out otherwise.

Balancing group definitions are for nested constructs. Nesting doesn't happen in strings, so balancing group definitions don't seem to even be the right tool for this.


Edit 1:

It depends on how many features you're looking for. If you simply want to match the next escaped quotation, you can use the pattern

^"([^\\\"]|\\.)*"

which, when escaped for code, turns out like

"^\"([^\\\\\\\"]|\\\\.)*\""

to match something like

"Hello! \" Hi! \" "

but as soon as you start adding more complicated requirements like Unicode escapes, it becomes a lot more tedious. Just do it by hand, it should be much simpler.


Edit 2:

If you're curious about how balancing group definitions work anyway, I recommend reading page 430 of this book (34 in pdf).

Upvotes: 1

Related Questions