Reputation:
I need to figure out a regular expression (Pattern) to be able to get characters between double quotes.
It's a little hard to explain, but here is what I want:
If I run this through said expression:
say("ex" + "ex2", "ex3");
I will then be able to get three matches, which are;
"ex", "ex2", and "ex3"
all in their own strings.
I've already tried this expression:
Pattern.compile("\\\"(.*)\\\"");
But instead of giving me three different .group()
s, I get one .group
which is "ex", "ex2", and "ex3"
So does anyone know an expression to give me the output I want?
Upvotes: 1
Views: 3770
Reputation: 476659
You can do this using a non-greedy approach:
"\\\"(.*?)\\\""
A non-greedy cuts of a group from the moment it is possible. In this case from the moment a second double quote is found.
Or for instance match all characters appart from the quote:
"(\\\"[^\"]*)\\\")"
[^list]
means all characters except the characters in the list
Furthermore, you can perhaps make it more readable by omitting double escaping:
"[\"]([^\"]*)[\"]"
Note furthermore that this doesn't work for recursive patterns: if the string to match is "foo "inner" bar"
, it will match "foo "
and not "foo "inner" bar"
, but I guess that's the semantics one is looking for.
EDIT:
in case you allow escaped double quotes as well, you can use negative lookbehind:
"([\"][^\"]*(?<!\\\\)[\"])"
The (?<!\\\\)
- unescaped (?<!\)
- means that one character before, a backspace is not allowed.
A problem with this approach however, is that one can also specify a string:
"Foo\\"
This is used to specify the string Foo\
(a real backspace).
A possible solution is to check if the lookbehind contains an odd number of consecutive backslashes, but that is not supported by Java, the solution is to make the inner loop of matching more complicated:
"([\"]([^\\\\\"]*([\\\\].)*)*[\"])"
Unescaped this regex is:
(["]([^\\"]*([\\].)*)*["])
^ ^ ^ ^
| | | \- tailing double quote
| | \- if backslash, skip next character (for instance `\\`, `\"` or `\n`
| \- match all except double quotes and backslashes
\-beginning double quote
See this jdoodle, it reads a raw string from the stdin
and outputs the captured groups.
Upvotes: 5