johni
johni

Reputation: 5568

regex multiple quotes selection

I'm trying to extract sub strings out of one big string. extracted sub strings should follow these rules: 1. between two double quote (e.g.: "hello \"jonathan\" how are you") would extract "jonathan" (without the double quotes).

  1. same as 1, just with single quotes.

  2. single quote is considered as a regular char when it's surrounded by double quotes. (e.g.: "Hello "Jonathan how 'are'" you today") would extract this sub string: "Jonathan how 'are'" -- without the double quotes.

I've been trying many combinations involving this pattern:

Pattern p1 = Pattern.compile("([\"]*[\']*[\']*[\"])");

this one solves one issue (num 3), in this example:

String s = "Hello \"Jon\'hello\'athan\" how are 'you'"

It does extracts

Jon'hello'athan

but when I add something like:

([\'])|[\"])

to the pattern, it treats it like the whole pattern was

([\'])|[\"])

what would you recommend ? Thank you

Upvotes: 1

Views: 1930

Answers (1)

Tim Pietzcker
Tim Pietzcker

Reputation: 336148

As long as you don't need to deal with escaped quotes, and as long as all your quotes are correctly balanced, you can make use of a negative lookahead assertion:

(['"])((?:(?!\1).)*)\1

or, in Java:

Pattern p1 = Pattern.compile("(['\"])((?:(?!\\1).)*)\\1");

Explanation:

(['"])   # Match any quote character, capture it in group 1
(        # Match and capture in group 2:
 (?:     # Start of non-capturing group that matches...
  (?!\1) #  (as long as it's not the same quote character as in group 1)
  .      # ...any character
 )*      # any number of times.
)        # End of capturing group 2
\1       # Match the same quote as before

Test it live on regex101.com.

Upvotes: 3

Related Questions