MarkoSerbia
MarkoSerbia

Reputation: 113

What is the right regex for my String.split()

I am splitting equation string into string array like this:

String[] equation_array = (equation.split("(?<=[-+×÷)(])|(?=[-+×÷)(])"));

Now for test string:

test = "4+(2×5)"

result is fine:

test_array = {"4", "+", "(", "2",...}

but for test string:

test2 = "(2×5)+5"

I got string array:

test2_array = {"", "(", "×",...}.

So, problem is why does it add an empty string before ( in array after splitting?

Upvotes: 3

Views: 397

Answers (4)

anubhava
anubhava

Reputation: 786091

This is actually known behavior in Java regex.

To avoid this empty result use this negative lookahead based regex:

String[] equation_array = "(2×5)+5".split("(?!^)((?<=[-+×÷)(])|(?=[-+×÷)(]))");
//=> ["(", "2", "×", "5", ")", "+", "5"]

What (?!^) means is to avoid splitting at line start.

Upvotes: 2

Ibrahim Najjar
Ibrahim Najjar

Reputation: 19423

problem is why does it add an empty string before ( in array after splitting?

Because for the input (2×5)+5 the regex used for splitting matches right at the start-of-string because of the positive look ahead (?=[-+×÷)(]).

(2×5)+5
↖

It matches right here before the (, resulting in an empty string: "".

My advice would be not to use regular expressions to parse mathematical expressions, there are more suitable algorithms for this.

Upvotes: 0

Bernhard Barker
Bernhard Barker

Reputation: 55649

What about looking backwards to make sure we're not at the start of the string, and looking forwards to make sure we're not at the end?

"(?<=[-+×÷)(])(?!$)|(?<!^)(?=[-+×÷)(])"

Here ^ and $ are start and end of string indicators and (?!...) and (?<!...) are negative lookahead and lookbehind.

Upvotes: 0

Pshemo
Pshemo

Reputation: 124275

You can add condition that not to split if before token is start of string like

"(?<=[-+×÷)(])|(?<!^)(?=[-+×÷)(])"
               ^^^^^^

Upvotes: 0

Related Questions