Reputation: 5568
I'm trying to extract sub strings out of one big string. extracted sub strings should follow these rules: 1. between two double quote (e.g.: "hello \"jonathan\" how are you") would extract "jonathan" (without the double quotes).
same as 1, just with single quotes.
single quote is considered as a regular char when it's surrounded by double quotes. (e.g.: "Hello "Jonathan how 'are'" you today") would extract this sub string: "Jonathan how 'are'" -- without the double quotes.
I've been trying many combinations involving this pattern:
Pattern p1 = Pattern.compile("([\"]*[\']*[\']*[\"])");
this one solves one issue (num 3), in this example:
String s = "Hello \"Jon\'hello\'athan\" how are 'you'"
It does extracts
Jon'hello'athan
but when I add something like:
([\'])|[\"])
to the pattern, it treats it like the whole pattern was
([\'])|[\"])
what would you recommend ? Thank you
Upvotes: 1
Views: 1930
Reputation: 336148
As long as you don't need to deal with escaped quotes, and as long as all your quotes are correctly balanced, you can make use of a negative lookahead assertion:
(['"])((?:(?!\1).)*)\1
or, in Java:
Pattern p1 = Pattern.compile("(['\"])((?:(?!\\1).)*)\\1");
Explanation:
(['"]) # Match any quote character, capture it in group 1
( # Match and capture in group 2:
(?: # Start of non-capturing group that matches...
(?!\1) # (as long as it's not the same quote character as in group 1)
. # ...any character
)* # any number of times.
) # End of capturing group 2
\1 # Match the same quote as before
Test it live on regex101.com.
Upvotes: 3