Jochen Hebbrecht
Jochen Hebbrecht

Reputation: 743

Regex to match words between single or double quotes in a string

I'm looking for the correct regex to provide me the following results:

I currently have:

Pattern pattern = Pattern.compile("[^\\s\"']+|\"([^\"]*)\"|'([^']*)'");

... but the following examples are not completely working. Who can help me with this one?

Examples:

Upvotes: 1

Views: 8698

Answers (2)

Keppil
Keppil

Reputation: 46209

I'm not sure if you can do this in one Matcher.match call, but you can do it with a loop.
This code piece solves all the cases you mention above by using Matcher.find() repeatedly:

Pattern pattern = Pattern.compile("\"([^\"]+)\"|'([^']+)'|\\S+");
List<String> testStrings = Arrays.asList("foo bar", "\"foo bar\"","'foo bar'", "'foo bar", "\"'foo bar\"", "foo bar'", "foo bar\"", "\"foo bar\" \"stack overflow\"", "\"foo' bar\" \"stack overflow\" how do you do");
for (String testString : testStrings) {
    int count = 1;
    Matcher matcher = pattern.matcher(testString);
    System.out.format("* %s%n", testString);
    while (matcher.find()) {
        System.out.format("\t* group%d: %s%n", count++, matcher.group(1) == null ? matcher.group(2) == null ? matcher.group() : matcher.group(2) : matcher.group(1));
    }
}

This prints:

* foo bar
    * group1: foo
    * group2: bar
* "foo bar"
    * group1: foo bar
* 'foo bar'
    * group1: foo bar
* 'foo bar
    * group1: 'foo
    * group2: bar
* "'foo bar"
    * group1: 'foo bar
* foo bar'
    * group1: foo
    * group2: bar'
* foo bar"
    * group1: foo
    * group2: bar"
* "foo bar" "stack overflow"
    * group1: foo bar
    * group2: stack overflow
* "foo' bar" "stack overflow" how do you do
    * group1: foo' bar
    * group2: stack overflow
    * group3: how
    * group4: do
    * group5: you
    * group6: do

Upvotes: 7

SJuan76
SJuan76

Reputation: 24780

Anytime you have pairings (let it be quotes, or braces) you leave the realm of regex and go into the realm of grammar, which need parsers.

I'll leave you with the ultimate answer to this question

UPDATE:

A little more explanation.

A grammar is usually expressed as:

construct -> [set of constructs or terminals]

For example, for quotes

doblequotedstring := " simplequotedstring "
simplequotedstring := string ' string
                      | string '
                      | ' string
                      | '

This is a simple example; there will be proper examples of grammars for quoting in the internet.

I have used aflex and ajacc for this (for Ada; in Java exist jflex and jjacc). You pass the list of identifiers to aflex, generate an output, pass that output and the grammar to ajacc and you get an Ada parser. Since it has been a lot of time since I used them, I do not know if there are more streamlined solutions but in the basic it will need the same input.

Upvotes: 1

Related Questions