Reputation: 197
I am having a hard time figuring out the pattern to ignore escaped quotes. I want this:
"10\" 2 Topping Pizza, Pasta, or Sandwich for $5 each. Valid until 2pm. Carryout only.","blah blah"
to match as:
1> "10\" 2 Topping Pizza, Pasta, or Sandwich for $5 each. Valid until 2pm. Carryout only."
2> "blah blah"
I have been trying this:
Pattern pattern = Pattern.compile("\"[^\"]*\"");
Matcher matcher = pattern.matcher(filteredCoupons);
and I get this
1> "10\"
2> ","
Upvotes: 2
Views: 1648
Reputation: 12389
Also could use a negative lookbehind:
(?s)".*?"(?<!\\.)
as a Java String:
"(?s)\".*?\"(?<!\\\\.)"
See test at regex101; test at regexplanet (click on "Java")
"
it looks behind, if there's no preceding backslash skipping one character".*?(?<!\\)"
but of better performance to look behind after meeting a "
(?s)
flag for making the dot also match newlinesFor interest I did a benchmark of the different versions with the sample string at regexhero.net (thanks @stribizhev for this link!). Was unsure if the stepscounter of regex101 is accurate here.
Used only non-capturing groups for the benchmark. Interesting was, that "(?:\\.|[^"])*"
has almost double performance against the same with capture group "(\\.|[^"])*"
.
Upvotes: 0
Reputation: 124275
It seems that your regex need to accept non-quotes, or quotes which have \
before it. In that case try with
Pattern pattern = Pattern.compile("\"(\\\\.|[^\"])*\"");
This part of regex \\\\.|[^\"]
will try to find
\.
- any escaped character, |
or) [^\"]
- any non-quote characterI placed \.
before [^\"]
to prevent \
being matched by [^\"]
.
In other words for text like foo\"bar"
and regex \\\\.|[^\"]
you will get this matching
foo\"bar"
^^^-matched by [^\"]
foo\"bar"
^^-matched by \.
foo\"bar"
^^^-matched by [^\"]
foo\"bar"
^-can't be matched by anything since there is no \ before
nor it is non-quote
DEMO:
String filteredCoupons = "\"10\\\" 2 Topping Pizza, Pasta, or Sandwich for $5 each. Valid until 2pm. Carryout only.\",\"blah blah\"";
Pattern pattern = Pattern.compile("\"(\\\\.|[^\"])*\"");
Matcher matcher = pattern.matcher(filteredCoupons);
while(matcher.find()){
System.out.println(matcher.group());
}
Output:
"10\" 2 Topping Pizza, Pasta, or Sandwich for $5 each. Valid until 2pm. Carryout only."
"blah blah"
Upvotes: 0
Reputation: 627292
The regex you are looking for is
"[^"\\]*(?:\\.[^"\\]*)*"
See demo
In Java,
String pattern = "\"[^\"\\\\]*(?:\\\\.[^\"\\\\]*)*\"";
Upvotes: 4