Reputation: 4157
I am attempting to create a parser for java expressions, but for some reason I am unable to match floating point values. I am using a java.util.Matcher
obtained from
Matcher token = Pattern.compile(
"(\\w[\\w\\d]*+)|" + //identifiers as group 1
"((?:(?>[1-9][0-9]*+\\.?[0-9]*+)|(?>\\.[0-9]++))(?:[Ee][+-]?[0-9]++)?)|" + //literal numbers
"([^\\w\\d\\s]*+)" //operators as group 3
).matcher();
This is intended to match an identifier, a floating point value, or an operator (I still need to refine that part of the match though will refine that part of the match later). However, I am having an issue with it in that
Below is the code that is using that expression, which is intended to take all the identifiers, numbers, and operators, register all the numbers in vars
, and put all the identifiers, each number's corresponding value, and all the operators in tokens
in same order as in the original string.
It does not succeed in doing so, however, because for an input string like foo 34.78e5 bar -2.7
the resulting list is '[34, A, , bar, , -, 2, B, ]' with A=-78000.0 and B=-0.7. It is supposed to return '[foo, A, bar, B]` with A=3478000 and B=-2.7. I beleive it may be just that it is failing to include both parts of the number as the match of the regex, however that may not be the case.
I have tried removing the atomic grouping and possesives from the regex, however that did not change anything.
LinkedList<String> tokens = new LinkedList<String>();
HashMap<String, Double> vars = new HashMap<String, Double>();
VariableNamer varNamer = new VariableNamer();
for(Matcher token = Pattern.compile(
"(\\w[\\w\\d]*+)|" + //variable names as group 1
"((?:(?:[1-9][0-9]*+\\.?[0-9]*+)|(?:\\.[0-9]++))(?:[Ee][+-]?[0-9]++)?)|" +
//literal numbers as group 2
"([^\\w\\d\\s]*+)" //operators as group 3
).matcher(expression); token.find();){
if(token.group(2) != null) { //if its a literal number, register it in vars and substitute a string for it
String name = varNamer.next();
if (
tokens.size()>0 &&
tokens.get(tokens.size()-1).matches("[+-]") &&
tokens.size()>1?tokens.get(tokens.size()-2).matches("[^\\w\\d\\s]"):true
)
vars.put(name, tokens.pop().equals("+")?Double.parseDouble(token.group()):-Double.parseDouble(token.group()));
else
vars.put(name, Double.parseDouble((token.group())));
tokens.addLast(name);
} else {
tokens.addLast(token.group());
}
}
and here is VariableNamer
:
import java.util.Iterator;
public class VariableNamer implements Iterator<String>{
StringBuffer next = new StringBuffer("A");
@Override
public boolean hasNext() {
return true;
}
@Override
public String next() {
try{
return next.toString();
}finally{
next.setCharAt(next.length()-1, (char) (next.charAt(next.length()-1) + 1));
for(int idx = next.length()-1; next.charAt(idx) + 1 > 'Z' && idx > 0; idx--){
next.setCharAt(idx, 'A');
next.setCharAt(idx - 1, (char) (next.charAt(idx - 1) + 1));
}
if (next.charAt(0) > 'Z'){
next.setCharAt(0, 'A');
next.insert(0, 'A');
}
}
}
@Override
public void remove() {
throw new UnsupportedOperationException();
}
}
Upvotes: 0
Views: 349
Reputation: 718986
Depending on details of your expression mini-language, it is either close to the limit on what is possible using regexes ... or beyond it. And even if you do succeed in "parsing", you will be left with the problem of mapping the "group" substrings into a meaningful expression.
My advice would be to take an entirely different approach. Either find / use an existing expression library, or implement expression parsing using a parser generator like ANTLR or Javacc.
Upvotes: 1