user1408286
user1408286

Reputation: 27

extract substring between quotation marks ignoring \"

My file contains some lines such as

"This is a string." = "This is a string's content."
" Another \" example \"" = " New example."
"My string
can have several lines." = "My string can have several lines."

I need to extract the substring :

This is a string.
This is a string's content.
 Another \" example \"
 New example.
My string
can have several lines.
My string can have several lines.

Here's my code:

String regex = "\".*?\"\\s*?=\\s*?\".*?\"";
Pattern pattern = Pattern.compile(regex,Pattern.DOTALL);
Matcher matcher = pattern.matcher(file);

For the moment, I can get the pair of left and right part of "=". But when my substring contains " \" ", my regex dosen't do the right job.

Can anyone help me write the correct regex please ? I tried \"^[\\"] instead of \", but it didn't work..

Thanks advance.

Upvotes: 1

Views: 1123

Answers (3)

Mitja
Mitja

Reputation: 2011

I'm sorry that I'm on a location where i can't test this, but does

\".*?(?:[^\\]\")\\s*=\\s*\".*?(?:[^\\]\")

work?

I just replaced the \" with (?:[^\\]\") so they won't match if the char before them is a \ anymore.

Upvotes: 0

mogelbrod
mogelbrod

Reputation: 2316

/"([^"\\]*(?:\\.[^"\\]*)*)"/

Source. Also see this previous question.

Upvotes: -1

Tim Pietzcker
Tim Pietzcker

Reputation: 336428

List<String> matchList = new ArrayList<String>();
Pattern regex = Pattern.compile(
    "\"          # Match a quote\n" +
    "(           # Capture in group number 1:\n" +
    " (?:        # Match either...\n" +
    "  \\\\.     # an escaped character\n" +
    " |          # or\n" +
    "  [^\"\\\\] # any character except quotes or backslashes\n" +
    " )*         # Repeat as needed\n" +
    ")           # End of capturing group\n" +
    "\"          # Match a quote", 
    Pattern.COMMENTS);
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
    matchList.add(regexMatcher.group(1));
} 

Upvotes: 3

Related Questions