GuyA
GuyA

Reputation: 197

Java Pattern.compile ignoring escaped double quotes (\")

I am having a hard time figuring out the pattern to ignore escaped quotes. I want this:

    "10\" 2 Topping Pizza, Pasta, or Sandwich for $5 each. Valid until 2pm. Carryout only.","blah blah" 

to match as:

   1> "10\" 2 Topping Pizza, Pasta, or Sandwich for $5 each. Valid until 2pm. Carryout only."
   2> "blah blah" 

I have been trying this:

    Pattern pattern = Pattern.compile("\"[^\"]*\"");
    Matcher matcher = pattern.matcher(filteredCoupons);

and I get this

   1> "10\"
   2> "," 

Upvotes: 2

Views: 1648

Answers (3)

Jonny 5
Jonny 5

Reputation: 12389

Also could use a negative lookbehind:

(?s)".*?"(?<!\\.)

as a Java String:

"(?s)\".*?\"(?<!\\\\.)"

See test at regex101; test at regexplanet (click on "Java")

  • After meeting a " it looks behind, if there's no preceding backslash skipping one character
  • Similar ".*?(?<!\\)" but of better performance to look behind after meeting a "
  • Used (?s) flag for making the dot also match newlines

For interest I did a benchmark of the different versions with the sample string at regexhero.net (thanks @stribizhev for this link!). Was unsure if the stepscounter of regex101 is accurate here.

enter image description here

Used only non-capturing groups for the benchmark. Interesting was, that "(?:\\.|[^"])*" has almost double performance against the same with capture group "(\\.|[^"])*".

Upvotes: 0

Pshemo
Pshemo

Reputation: 124275

It seems that your regex need to accept non-quotes, or quotes which have \ before it. In that case try with

Pattern pattern = Pattern.compile("\"(\\\\.|[^\"])*\"");

This part of regex \\\\.|[^\"] will try to find

  • \. - any escaped character,
  • (| or) [^\"] - any non-quote character

I placed \. before [^\"] to prevent \ being matched by [^\"].

In other words for text like foo\"bar" and regex \\\\.|[^\"] you will get this matching

foo\"bar"
^^^-matched by [^\"]

foo\"bar"
   ^^-matched by \.

foo\"bar"
     ^^^-matched by [^\"]

foo\"bar"
        ^-can't be matched by anything since there is no \ before
          nor it is non-quote

DEMO:

String filteredCoupons = "\"10\\\" 2 Topping Pizza, Pasta, or Sandwich for $5 each. Valid until 2pm. Carryout only.\",\"blah blah\"";
Pattern pattern = Pattern.compile("\"(\\\\.|[^\"])*\"");
Matcher matcher = pattern.matcher(filteredCoupons);
while(matcher.find()){
    System.out.println(matcher.group());
}

Output:

"10\" 2 Topping Pizza, Pasta, or Sandwich for $5 each. Valid until 2pm. Carryout only."
"blah blah"

Upvotes: 0

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627292

The regex you are looking for is

"[^"\\]*(?:\\.[^"\\]*)*"

See demo

In Java,

String pattern = "\"[^\"\\\\]*(?:\\\\.[^\"\\\\]*)*\"";

Upvotes: 4

Related Questions