user2031395
user2031395

Reputation: 11

Splitting string expression into tokens

My input is like

String str = "-1.33E+4-helloeeee+4+(5*2(10/2)5*10)/2";

i want the output as:

1.33E+4
helloeeee
4
5
2
10
2
5
10
2

But I am getting the output as

1.33, 4, helloeeee, 4, 5, 2, 10, 2, 5, 10, 2

i want the exponent value completely after splitting "1.33e+4"

here is my code:

    String str = "-1.33E+4-helloeeee+4+(5*2(10/2)5*10)/2";
    List<String> tokensOfExpression = new ArrayList<String>();
    String[] tokens=str.split("[(?!E)+*\\-/()]+");
    for(String token:tokens)
    {   
         System.out.println(token);
         tokensOfExpression.add(token);
    }
    if(tokensOfExpression.get(0).equals(""))
    {
         tokensOfExpression.remove(0);
    }

Upvotes: 0

Views: 1263

Answers (4)

user207421
user207421

Reputation: 310883

You can't do that with a single regular expression, because of the ambiguities introduced by FP constants in scientific notation, and in any case you need to know which token is which without having to re-scan them. You've also mis-stated your requirement, as you certainly need the binary operators in the output as well. You need to write both a scanner and a parser. Have a look for 'recursive descent expression parser' and 'Dijkstra shunting-yard algorithm'.Resetting the digest is redundant.

Upvotes: 1

Evgeniy Dorofeev
Evgeniy Dorofeev

Reputation: 136002

It's easier to achieve the result with Matcher

    String str = "-1.33E+4-helloeeee+4+(5*2(10/2)5*10)/2";
    Matcher m = Pattern.compile("\\d+\\.\\d*E[+-]?\\d+|\\w+").matcher(str);
    while(m.find()) {
        System.out.println(m.group());
    }

prints

1.33E+4
helloeeee
4
5
2
10
2
5
10
2

note that it needs some testing for different floating point expressions but it is easily adjustable

Upvotes: 0

jsj
jsj

Reputation: 9381

I would first replace the E+ with a symbol that is not ambiguous such as

str.ReplaceAll("E+","SCINOT");

You can then parse with StringTokenizer, replacing the SCINOT symbol when you need to evaluate the number represented in scientific notation.

Upvotes: 1

Aditi
Aditi

Reputation: 1188

Try this

String[] tokens=str.split("(?<!E)+[*\\-/()+]");

Upvotes: 0

Related Questions