Reputation: 3088
i am trying to split this input:
sum(12),sum(3,34,23),122
into these:
sum(12)
sum(3,34,23)
122
I have the following code
pattern = Pattern.compile("^|,|\\G(sum\\(.*\\)|[0-9]+)$|,");
matcher = pattern.matcher(parameter);
while(matcher.find()) {
System.out.println("match: " + matcher.group(1));
}
parameter = calculateFormula(parameter);
However it matches
sum(12),sum(3,34,23)
what should I do to get the result as I wanted.
Upvotes: 0
Views: 113
Reputation: 1239
Your problem is because the .*
in sum\\(.*
is greedy, and matches all of "12),sum(3,34,23
".
You can probably fix it by changing that to a non-greedy match .*?
.
However, personally, I'd go for something exceedingly simple, like:
"\\w+\\(.*?\\)|[^,]+"
...meaning "greedily match any word, followed by a minimal amount of stuff in brackets, or failing that, any greedy string of one or more things that aren't commas".
Otherwise, the problem becomes the somewhat more complex "split on any commas that aren't contained within parens", which would involve look-ahead assertions and all sorts, and rapidly becomes a huge mess if you can have nested parens like sum(3,sum(34,23),4)
, or if you can't assume matched parens, and so on.
If you're going down THAT road, my recommendation would generally be to tokenise on non-word-character boundaries instead, and split into:
'sum' '(' '12' '),' 'sum' '(' '3' ',' '34' ',' '23' ')' ',' '122'
...then handle each token in turn, in a state machine.
Upvotes: 0
Reputation: 3408
How about this regex:
,(?![^\(\)]*\))
,
looks for any comma
(?!...)
is a negative look ahead. "Will try to match it's content from this position. If it succeeds, the look-ahead fails. If it fails, the look-ahead succeeds. It will restore the cursor position after the match."
`[^...] A negated character class. Matches any characters except the ones inside of it.
\(\)
escapes the '(' and the ')' operators respectively so that regex understands them as characters.
*
greedy repeater. Looks for something until it stops occurring. In this case it matches all characters that are not parenthesis until it finds a parenthesis.
\)
escaped operator now seen as a character.
The regex, as written in english would say:
Look for a comma that does not match non-parenthetical text with a parenthesis after it.
Upvotes: 0
Reputation: 2254
Use a "?". For instance
String parameter = "sum(12),sum(3,34,23),122";
Pattern pattern = Pattern.compile("(sum\\(.*?\\)|[0-9]+)");//
Matcher matcher = pattern.matcher(parameter);
while (matcher.find()) {
System.out.println("match: " + matcher.group(1));
}
Will print:
match: sum(12)
match: sum(3,34,23)
match: 122
Upvotes: 3