Yunus Eren Güzel
Yunus Eren Güzel

Reputation: 3088

Splitting on , with regexp in java

i am trying to split this input:

sum(12),sum(3,34,23),122

into these:

sum(12)

sum(3,34,23)

122

I have the following code

        pattern = Pattern.compile("^|,|\\G(sum\\(.*\\)|[0-9]+)$|,");
        matcher = pattern.matcher(parameter);
        while(matcher.find()) {
            System.out.println("match: " + matcher.group(1));
        }
        parameter = calculateFormula(parameter); 

However it matches

sum(12),sum(3,34,23)

what should I do to get the result as I wanted.

Upvotes: 0

Views: 113

Answers (3)

Dewi Morgan
Dewi Morgan

Reputation: 1239

Your problem is because the .* in sum\\(.* is greedy, and matches all of "12),sum(3,34,23".

You can probably fix it by changing that to a non-greedy match .*?.

However, personally, I'd go for something exceedingly simple, like:

"\\w+\\(.*?\\)|[^,]+"

...meaning "greedily match any word, followed by a minimal amount of stuff in brackets, or failing that, any greedy string of one or more things that aren't commas".

Otherwise, the problem becomes the somewhat more complex "split on any commas that aren't contained within parens", which would involve look-ahead assertions and all sorts, and rapidly becomes a huge mess if you can have nested parens like sum(3,sum(34,23),4), or if you can't assume matched parens, and so on.

If you're going down THAT road, my recommendation would generally be to tokenise on non-word-character boundaries instead, and split into:

'sum' '(' '12' '),' 'sum' '(' '3' ',' '34' ',' '23' ')' ',' '122'

...then handle each token in turn, in a state machine.

Upvotes: 0

Enigmadan
Enigmadan

Reputation: 3408

How about this regex:

,(?![^\(\)]*\))

, looks for any comma

(?!...) is a negative look ahead. "Will try to match it's content from this position. If it succeeds, the look-ahead fails. If it fails, the look-ahead succeeds. It will restore the cursor position after the match."

`[^...] A negated character class. Matches any characters except the ones inside of it.

\(\) escapes the '(' and the ')' operators respectively so that regex understands them as characters.

* greedy repeater. Looks for something until it stops occurring. In this case it matches all characters that are not parenthesis until it finds a parenthesis.

\) escaped operator now seen as a character.

The regex, as written in english would say:

Look for a comma that does not match non-parenthetical text with a parenthesis after it.

Upvotes: 0

mael
mael

Reputation: 2254

Use a "?". For instance

String parameter = "sum(12),sum(3,34,23),122";
Pattern pattern = Pattern.compile("(sum\\(.*?\\)|[0-9]+)");//
Matcher matcher = pattern.matcher(parameter);
while (matcher.find()) {
    System.out.println("match: " + matcher.group(1));
}

Will print:

match: sum(12)
match: sum(3,34,23)
match: 122

Upvotes: 3

Related Questions