Reputation: 743
I'm looking for the correct regex to provide me the following results:
I currently have:
Pattern pattern = Pattern.compile("[^\\s\"']+|\"([^\"]*)\"|'([^']*)'");
... but the following examples are not completely working. Who can help me with this one?
Examples:
Upvotes: 1
Views: 8698
Reputation: 46209
I'm not sure if you can do this in one Matcher.match
call, but you can do it with a loop.
This code piece solves all the cases you mention above by using Matcher.find()
repeatedly:
Pattern pattern = Pattern.compile("\"([^\"]+)\"|'([^']+)'|\\S+");
List<String> testStrings = Arrays.asList("foo bar", "\"foo bar\"","'foo bar'", "'foo bar", "\"'foo bar\"", "foo bar'", "foo bar\"", "\"foo bar\" \"stack overflow\"", "\"foo' bar\" \"stack overflow\" how do you do");
for (String testString : testStrings) {
int count = 1;
Matcher matcher = pattern.matcher(testString);
System.out.format("* %s%n", testString);
while (matcher.find()) {
System.out.format("\t* group%d: %s%n", count++, matcher.group(1) == null ? matcher.group(2) == null ? matcher.group() : matcher.group(2) : matcher.group(1));
}
}
This prints:
* foo bar
* group1: foo
* group2: bar
* "foo bar"
* group1: foo bar
* 'foo bar'
* group1: foo bar
* 'foo bar
* group1: 'foo
* group2: bar
* "'foo bar"
* group1: 'foo bar
* foo bar'
* group1: foo
* group2: bar'
* foo bar"
* group1: foo
* group2: bar"
* "foo bar" "stack overflow"
* group1: foo bar
* group2: stack overflow
* "foo' bar" "stack overflow" how do you do
* group1: foo' bar
* group2: stack overflow
* group3: how
* group4: do
* group5: you
* group6: do
Upvotes: 7
Reputation: 24780
Anytime you have pairings (let it be quotes, or braces) you leave the realm of regex and go into the realm of grammar, which need parsers.
I'll leave you with the ultimate answer to this question
UPDATE:
A little more explanation.
A grammar is usually expressed as:
construct -> [set of constructs or terminals]
For example, for quotes
doblequotedstring := " simplequotedstring "
simplequotedstring := string ' string
| string '
| ' string
| '
This is a simple example; there will be proper examples of grammars for quoting in the internet.
I have used aflex and ajacc for this (for Ada; in Java exist jflex and jjacc). You pass the list of identifiers to aflex, generate an output, pass that output and the grammar to ajacc and you get an Ada parser. Since it has been a lot of time since I used them, I do not know if there are more streamlined solutions but in the basic it will need the same input.
Upvotes: 1