Reputation: 39
In String s
, there are variables and arrays. For example, s
could be a - (b+A[B[2]])*d + 3
, and the values for those variables are stored in a separate file. This is a segment of my code in which I am trying to replace any variables in my current string s
with their corresponding value from the other file.
I only want to replace the variables that are scalar variables, not arrays, and I have an ArrayList called scalars
which has all scalar variables stored. So for the given example, with a=1
, b=2
, d=3
I would want to make s
into 1 - (2+A[B[2]])*3 + 3
. scalars
also does not contain duplicate variables, so my code only works for single, non duplicate variables such as a,b,c
and not for variables such as varx
. How could I improve my code to work for every situation, or is there a better approach?
String s = expr;
if(scalars.size()>0){
int j = 0; //make so duplicates can be used , so var can be used, so it doesn't detect arrays
for(int k = 0; k<s.length(); k++){
if(Character.isLetter(s.charAt(k))){
s= s.substring(0,k) + this.scalars.get(j).value + s.substring(k+1,s.length());
j++;
}
}
}
edit: full evaluate code (not yet complete). I am trying to convert it to a String without variables so I can eventually recursively call evalNoPB when I introduce parentheses/brackets
public float evaluate() {
String s = expr;
if(scalars.size()>0){
int j = 0; //make so duplicates can be used , so var can be used, so it doesn't detect arrays
for(int k = 0; k<s.length(); k++){
if(Character.isLetter(s.charAt(k))){
s= s.substring(0,k) + this.scalars.get(j).value + s.substring(k+1,s.length());
j++;
}
}
}
System.out.println(s);
float answer = 0;
if(s==null || s.length() == 0){
return 0;
}
//one single variable or just a number
if(s.contains("+") == false && s.contains("-") == false && s.contains("*") == false && s.contains("/") == false && s.contains("[") == false &&s.contains("]") == false && s.contains("(") == false && s.contains(")") == false){
if(scalars.size() == 0){
answer = Float.parseFloat(s);
return answer;
}
answer = this.scalars.get(0).value;
System.out.println("one var/number loop");
return answer;
}
//no parentheses/brackets
if(s.contains("(") == false && s.contains(")") == false && s.contains("[") == false && s.contains("]") == false && (s.contains("+") == true || s.contains("-") == true || s.contains("*") == true || s.contains("/") == true)){
System.out.println("no parens loop");
answer = evalNoPB(s);
return answer;
}
//make compiler happy
System.out.println("no loop");
return 0;
}
//no parentheses/brackets
private float evalNoPB(String s){
float tempAns= 0;
if(s.contains("*") == false && s.contains("/") == false && s.contains("+") == false && s.contains("-") == false){
return Float.parseFloat(s);
}
if(s.length()-1>0){
int i;
boolean foundPlusMinus = false;
for(i=s.length()-1; i>0; i--){
if(s.charAt(i) == '+' || s.charAt(i) == '-'){
System.out.println(i);
foundPlusMinus = true;
break; // keep value of i for substrings
}
foundPlusMinus = false;
}
if (foundPlusMinus == false) { // for loop went through and did not find + or -
for(i=s.length()-1; i>0; i--){
if(s.charAt(i) == '*' || s.charAt(i) == '/'){
System.out.println(i);
break; // keep value of i for substrings
}
}
}
String sub1 = s.substring(0,i);
System.out.println(sub1);
String sub2 = s.substring(i+1, s.length());
System.out.println(sub2);
if(s.charAt(i) == '+'){
tempAns = evalNoPB(sub1) + evalNoPB(sub2);
} else if(s.charAt(i) == '-'){
tempAns = evalNoPB(sub1) - evalNoPB(sub2);
}else if(s.charAt(i) == '*'){
tempAns = evalNoPB(sub1) * evalNoPB(sub2);
}else if (s.charAt(i) == '/'){
float divisorCheck = evalNoPB(sub2);
if(divisorCheck!= 0){
tempAns = evalNoPB(sub1) / evalNoPB(sub2);
}else { // cannot divide by 0
throw new IllegalArgumentException("cannot divide by 0");
}
}
}
return tempAns;
}
Upvotes: 1
Views: 242
Reputation: 3200
What you are asking for is a compiler, a rather complicated program which parses input in a certain programming language and transforms it into another programming language.
In this case the input language would be expressions with numbers, scalar variables, array variables, sums, multiplications and parenthesis.
The output language would be expressions with numbers, array variables, sums, multiplications and parenthesis (note it does not contain scalar variables).
There is a lot of literature about compilers. It is a subject which spans several courses in computer science university degrees.
You can get away with simple substitutions if you define a very restricted language, as in OP. But once you start dealing with more general languages, like variables of length longer than 1, you start needing the whole architecture to build a compiler.
A compiler has several parts :
Lexical parser
The lexical parser is to break the input into tokens.
In example, consider the input "3+a".
The expected tokens are : NUMBER PLUS IDENTIFIER
I'll define the tokens in terms of regular expressions :
Here is the code for the lexical parser.
First a Token class which holds a token, with its type and the substring of the input from which the token was generated.
public class Token
{
public static enum TokenType
{
PLUS,
MINUS,
MULTIPLY,
DIVIDE,
NUMBER,
IDENTIFIER,
OPEN_PARENTHESIS,
CLOSE_PARENTHESIS,
OPEN_BRACKET,
CLOSE_BRACKET,
EOL // End of line
}
public Token( TokenType type, String value)
{
this.type = type;
this.value = value;
}
private TokenType type;
public String value;
public TokenType getType()
{
return type;
}
@Override
public String toString()
{
return value;
}
}
Next the lexical analyzer proper.
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import expressionparser.Token.TokenType;
class Tokenizer
{
private final String input;
private Matcher matcher;
private Token lastToken;
public Tokenizer( String input )
{
this.input = input;
Pattern pattern =
Pattern.compile( "[+]|-|[*]|/|[0-9]+|[a-zA-Z]+|[(]|[)]|\\[|\\]|[ ]");
matcher = pattern.matcher( input );
lastToken = null;
}
public Token readAndComsumeToken( TokenType type ) throws ExpressionException
{
Token result = readToken();
if ( result.getType()!=type )
throw new ExpressionException("Erroneous exception" );
lastToken = null;
return result;
}
public Token readToken() throws ExpressionException
{
if (lastToken!=null)
return lastToken;
String value;
// Read till a non blank is received
do
{
if ( matcher==null )
{
lastToken = new Token(TokenType.EOL, "");
return lastToken;
}
if ( !matcher.find() )
throw new ExpressionException("Erroneous exception" );
value = matcher.group();
if ( matcher.end() >= input.length() )
{
// End of String
matcher = null;
}
if ( value.length()==0 )
throw new ExpressionException("Erroneous exception" );
} while ( value.equals( " " ) ||
value.equals("\t") || value.equals("\n") ||
value.equals("\f") || value.equals("\r") );
// Identify read token
TokenType type;
if ( value.equals("+") )
type = TokenType.PLUS;
else if ( value.equals("-") )
type = TokenType.MINUS;
else if ( value.equals("*") )
type = TokenType.MULTIPLY;
else if ( value.equals("/") )
type = TokenType.DIVIDE;
else if ( value.equals("(") )
type = TokenType.OPEN_PARENTHESIS;
else if ( value.equals(")") )
type = TokenType.CLOSE_PARENTHESIS;
else if ( value.equals("[") )
type = TokenType.OPEN_BRACKET;
else if ( value.equals("]") )
type = TokenType.CLOSE_BRACKET;
else
{
char firstChar = value.charAt(0);
if ( firstChar>='0' && firstChar<='9' )
type = TokenType.NUMBER;
else
type = TokenType.IDENTIFIER;
}
lastToken = new Token( type, value );
return lastToken;
}
public void consumeToken() throws IllegalStateException
{
if ( lastToken==null )
throw new IllegalStateException();
lastToken = null;
}
}
Tokenizer is based on Pattern a standard java class which can break up strings as dictated by regular expressions.
The constructor creates a Pattern matcher for the input expression.
The Pattern has an extra regular expression [ ]
not included in the previous list of tokens. This reads white spaces but they are not turned into tokens by Tokenizer
, they are ignored.
readToken
gets the next token from the matcher ignoring white spaces. It saves it in lastToken
because the syntactical parser may need to call several times readToken
for the same token and readToken
needs to always return the same token until it is consumed. If there was already a not consumed token from a previous call it is returned without getting a new one.
consumeToken
consumes the last read token by setting lastToken
to null
. It is an error to call it without having called before readToken
.
readAndComsumeToken
is a convenience method. In some cases the syntactical parser will know what the next token is before reading it. In that case it needs to read it, verify that it is the expected one and consume it. readAndConsumeToken
does it in a single call.
Syntactical parser.
For the syntactic parser first we create a context-free grammar which defines the input language.
I will use Backus-Naur Form for it.
<wholeExpression> ::= <expression> EOL
<expression> ::= <summand> <tailSum>
<summand> ::= <factor> <tailMultiplication>
<factor> ::= OPEN_PARENTHESIS <expression> CLOSE PARENTHESIS |
NUMBER |
IDENTIFIER <tailArray>
<tailSum> ::= PLUS <summand> <tailSum> |
MINUS <summand> <tailSum> |
""
<tailMultiplication> ::= MULTIPLY <factor> <tailMultiplication> |
DIVIDE <factor> <tailMultiplication> |
""
<tailArray> ::= OPEN_BRACKET <expression> CLOSE_BRACKET <tailArray> |
""
There are several ways to implement such grammar. And there are even tools to do it automatically.
I will implement it using a Java method for each non-terminal of the grammar. It is presented in the semantical translator since that is interwoven with the parser.
Semantical translator.
ExpressionParser
is the syntactical parser and the semantical translator combined. It uses Tokenizer
for doing the lexical analysis.
import expressionparser.Token.TokenType;
import java.util.List;
public class ExpressionParser
{
private Tokenizer lex;
private List<Scalar> scalars;
public String compile( String expression, List<Scalar> scalars ) throws ExpressionException
{
lex = new Tokenizer(expression);
this.scalars = scalars;
return evalWholeExpression();
}
private String evalWholeExpression( ) throws ExpressionException
{
String result = evalExpression();
lex.readAndComsumeToken(Token.TokenType.EOL);
return result;
}
private String evalExpression() throws ExpressionException
{
String left = evalSummand();
return evalTailSum( left );
}
private String evalSummand( ) throws ExpressionException
{
String left = evalFactor();
return evalTailMultiplication( left );
}
private String evalFactor() throws ExpressionException
{
Token token = lex.readToken();
if ( token.getType() == TokenType.OPEN_PARENTHESIS )
{
lex.consumeToken();
String result = evalExpression();
lex.readAndComsumeToken(TokenType.CLOSE_PARENTHESIS);
return "(" + result + ")";
}
else if ( token.getType() == TokenType.NUMBER )
{
lex.consumeToken();
return token.toString();
}
else if ( token.getType()==TokenType.IDENTIFIER )
{
lex.consumeToken();
String tailArray = evalTailArray();
if ( "".equals(tailArray) )
{
String scalarValue = evaluateScalar( token.toString() );
return scalarValue;
}
else
{
verifyIsNotScalar( token.toString() );
return token + tailArray;
}
}
else
throw new ExpressionException( "Incorrect expression" );
}
private String evalTailSum( String left ) throws ExpressionException
{
Token token = lex.readToken();
if ( token.getType()==TokenType.PLUS )
{
lex.consumeToken();
String right = evalSummand();
return evalTailSum( left + "+" + right );
}
else if ( token.getType()==TokenType.MINUS )
{
lex.consumeToken();
String right = evalSummand();
return evalTailSum( left + "-" + right );
}
else
return left;
}
private String evalTailMultiplication( String left ) throws ExpressionException
{
Token token = lex.readToken();
if ( token.getType()==TokenType.MULTIPLY )
{
lex.consumeToken();
String right = evalFactor();
return evalTailSum( left + "*" + right );
}
else if ( token.getType()==TokenType.DIVIDE )
{
lex.consumeToken();
String right = evalFactor();
return evalTailSum( left + "/" + right );
}
else
return left;
}
private String evalTailArray() throws ExpressionException
{
Token token = lex.readToken();
if ( token.getType() == TokenType.OPEN_BRACKET )
{
lex.consumeToken();
String result = evalExpression();
lex.readAndComsumeToken(TokenType.CLOSE_BRACKET);
return "[" + result + "]" + evalTailArray();
}
else
return "";
}
private String evaluateScalar( String text ) throws ExpressionException
{
assert text!=null;
for ( Scalar s : scalars )
{
if ( text.equals( s.identifier ) )
return "" + s.value;
}
throw new ExpressionException( "Incorrect expression" );
}
private void verifyIsNotScalar( String text ) throws ExpressionException
{
assert text!=null;
for ( Scalar s : scalars )
{
if ( text.equals( s.identifier ) )
throw new ExpressionException( "Incorrect expression" );
}
}
}
compile
creates a Tokenizer
lexical analyzer for the input and saves the list of Scalars. It then calls evalWholeExpression
, which returns the translation of the input, and returns such translation.
If the input is not valid an ExpressionException is thrown. Such exception would contain the error message. Which I have not made an effort to make useful. Dealing with error messages is a whole subject itself.
There is a method eval<non-terminal> for each non-terminal of our context-free grammar.
The grammar has been composed in such a way that examining the 1st token is enough to determine which branch to take for non-terminals with several options. This makes easy creating the eval methods.
In example, consider evalTailSum
. This method implements this rule :
<tailSum> ::= PLUS <summand> <tailSum> |
MINUS <summand> <tailSum> |
""
Which is used by itself and by this rule :
<expression> ::= <summand> <tailSum>
evalExpression
calls evalSummand
and gets the translation of the left side of a possible summation ( or substraction ). Then it calls evalTailSum
passing it this translation; finally returning the result of it.
<tailSum> has three branches.
So evalTailSum
gets the next token with lex.readToken()
. And then...
evalSummand
storing its result as right
. And calls 'evalTailSum' passing to it as left value left + "+" + right
. Notice how the sequence of token consumption and calls matches the terminals and non-terminals in the right side of the <tailSum> rule.left
translation received is returned as the result.The same schema is followed to create a method for each non-terminal in the context-free grammar.
All those methods are not actually translating much at all. They produce as output the same as they received from input.
But that is not the case for one important method : evalFactor
When it deals with the IDENTIFIER branch we are in the place where we need to translate variables into values as specified by a list of Scalars.
Implementing Scalar is simple :
public class Scalar
{
public Scalar( String identifier, int value )
{
this.identifier = identifier;
this.value = value;
}
public final String identifier;
public final int value;
}
In the IDENTIFIER branch evalFactor
consumes the identifier and calls evalTailArray
as per the <factor> rule in the context-free grammar.
evalTailArray
returns an empty string if there are no array subscrits, so IDENTIFIER must be a scalar variable. In this case it calls evaluateScalar
which searchs for that scalar in the list of scalars and returns its value so that it is translated, or if not found it will throw an ExpressionException.
If the result of evalTailArray
is not empty it must be an array. We do no translation here as per OP. But we do verify that there is no scalar with that name by calling verifyIsNotScalar
. If such verification fails this would be a semantic error (and so is if we have a non listed non-array variable), similar to the "undefined variable" or "incompatible type" errors you may get from a C++ or Java compiler. While all other errors thrown by our compiler would be syntactic errors similar to "syntax error : ; expected" or lexical errors like "illegal character @".
Using it
This small program uses the previous parser and tests it with several expressions. The last one being an incorrect expression.
import java.util.ArrayList;
import java.util.List;
public class Main
{
public static void main(String[] args)
{
try
{
final List<Scalar> scalars = new ArrayList<>();
scalars.add( new Scalar( "a", 1 ) );
scalars.add( new Scalar( "b", 2 ) );
scalars.add( new Scalar( "d", 3 ) );
scalars.add( new Scalar( "varx", 5 ) );
ExpressionParser parser = new ExpressionParser();
System.out.println( parser.compile("a - (b+A[B[2]])*d + 3", scalars));
System.out.println( parser.compile("5*(2+varx)+a", scalars));
System.out.println( parser.compile("B[a*384+(5+(5*(varx+3)))]+varx", scalars));
System.out.println( parser.compile("34+", scalars ) );
}
catch (ExpressionException ex)
{
System.out.println( ex.getMessage() );
}
}
}
Output is as expected :
1-(2+A[B[2]])*3+3
5*(2+5)+1
B[1*384+(5+(5*(5+3)))]+5
Incorrect expression
Upvotes: 1