randomdev8712
randomdev8712

Reputation: 271

regex for string with backslash for escape

I'm trying to come up with a pattern for finding every text that is between double or single quotation marks in java source code. This is what I have:

"(.*?)"|’(.*?)’

Debuggex Demo

This works for almost every case I guess except one:

"text\"moretext\"evenmore"

Debuggex Demo

This could be used as a valid String definition, because the quotes are escaped. The pattern does not recognize the inner part more text.

Any ideas for a pattern that accounts for this case?

Upvotes: 1

Views: 1184

Answers (2)

BladeMight
BladeMight

Reputation: 2810

That should work: "([^"\\]|\\.)*"|'([^'\\]|\\.)*' Regexr test.

Explanation:

  1. " matches ".
  2. [^"\\]|\\. negates match of \ & "(i.e. makes it to consume \") or continues match of \ and any character.
  3. * continue match.
  4. " matches "

Same for '.

Upvotes: 3

anubhava
anubhava

Reputation: 785246

You can use this regex to match single or double quotes string ignoring all escaped quotes:

(["'])([^\\]*?(?:\\.[^\\]*?)*)\1

RegEx Demo

RegEx Breakup:

  • (["']): Match single or double quote and capture it in group #1
  • (: Start Capturing group #2
    • [^\\]*?: Match 0 or more of any characters that is not a \
    • (?:`: Start non-capturing group
      • \\: Match a \
      • .: Followed by any character that is escaped
      • [^\\]*?: Followed by 0 or more of any non-\ characters
    • )*: End non-capturing group. Match 0 or more of this non-capturing group
  • ): End capturing group #2
  • \1: Match closing single or double quote matches in group #1

Upvotes: 4

Related Questions