Replacing various variables in expression with values?

Question

In String s, there are variables and arrays. For example, s could be a - (b+A[B[2]])*d + 3, and the values for those variables are stored in a separate file. This is a segment of my code in which I am trying to replace any variables in my current string s with their corresponding value from the other file.

I only want to replace the variables that are scalar variables, not arrays, and I have an ArrayList called scalars which has all scalar variables stored. So for the given example, with a=1, b=2, d=3 I would want to make s into 1 - (2+A[B[2]])*3 + 3. scalars also does not contain duplicate variables, so my code only works for single, non duplicate variables such as a,b,c and not for variables such as varx. How could I improve my code to work for every situation, or is there a better approach?

String s = expr; 

    if(scalars.size()>0){
        int j = 0; //make so duplicates can be used , so var can be used, so it doesn't detect arrays
        for(int k = 0; k



edit: full evaluate code (not yet complete). I am trying to convert it to a String without variables so I can eventually recursively call evalNoPB when I introduce parentheses/brackets

    public float evaluate() { 

    String s = expr; 

    if(scalars.size()>0){
        int j = 0; //make so duplicates can be used , so var can be used, so it doesn't detect arrays
        for(int k = 0; k0){
        int i;
        boolean foundPlusMinus = false;
        for(i=s.length()-1; i>0; i--){
            if(s.charAt(i) == '+' || s.charAt(i) == '-'){
                System.out.println(i);
                foundPlusMinus = true;
                break; // keep value of i for substrings 
            }
            foundPlusMinus = false;
        } 

        if (foundPlusMinus == false) { // for loop went through and did not find + or -
            for(i=s.length()-1; i>0; i--){
                if(s.charAt(i) == '*' || s.charAt(i) == '/'){
                    System.out.println(i);
                    break; // keep value of i for substrings
        }
    }
    }

    String sub1 = s.substring(0,i);
    System.out.println(sub1);
    String sub2 = s.substring(i+1, s.length());
    System.out.println(sub2);

    if(s.charAt(i) == '+'){
        tempAns = evalNoPB(sub1) + evalNoPB(sub2);
    } else if(s.charAt(i) == '-'){
        tempAns = evalNoPB(sub1) - evalNoPB(sub2);
    }else if(s.charAt(i) == '*'){
        tempAns = evalNoPB(sub1) * evalNoPB(sub2);
    }else if (s.charAt(i) == '/'){
        float divisorCheck = evalNoPB(sub2);
        if(divisorCheck!= 0){
        tempAns = evalNoPB(sub1) / evalNoPB(sub2);
        }else { // cannot divide by 0 
            throw new IllegalArgumentException("cannot divide by 0");
        }
}
}
    return tempAns;

}

Anonymous Coward · Accepted Answer

What you are asking for is a compiler, a rather complicated program which parses input in a certain programming language and transforms it into another programming language.

In this case the input language would be expressions with numbers, scalar variables, array variables, sums, multiplications and parenthesis.
The output language would be expressions with numbers, array variables, sums, multiplications and parenthesis (note it does not contain scalar variables).

There is a lot of literature about compilers. It is a subject which spans several courses in computer science university degrees.
You can get away with simple substitutions if you define a very restricted language, as in OP. But once you start dealing with more general languages, like variables of length longer than 1, you start needing the whole architecture to build a compiler.

A compiler has several parts :

A lexical parser. This breaks the input in TOKENS. Each token being an identifier, an operator and similar things. The lexical analyzer also removes blanks from the input.
A syntactical parser. A set of rules that define the language in terms of TOKENS. Often done with context-free grammars.
A semantical translator. This can take a structure built in the syntactical parser or can be interwoven in the syntactical parser. It translates the input language into the output language.
And some other optional parts, line an optimizer, which are beyond the scope of this question.

Lexical parser

The lexical parser is to break the input into tokens.
In example, consider the input "3+a".
The expected tokens are : NUMBER PLUS IDENTIFIER

I'll define the tokens in terms of regular expressions :

PLUS : [+]
MINUS : -
MULTIPLY : [*]
DIVIDE : /
NUMBER : [0-9]+
IDENTIFIER : [a-zA-Z]+
OPEN_PARENTHESIS : [(]
CLOSE_PARENTHESIS : [)]
OPEN_BRACKET : $$
CLOSE_BRACKET : $$
EOL : no regular expression for End Of Line. The lexical parser will return this token when the input is exhausted.

Here is the code for the lexical parser.
First a Token class which holds a token, with its type and the substring of the input from which the token was generated.

public class Token
{
    public static enum TokenType
    {
        PLUS,
        MINUS,
        MULTIPLY,
        DIVIDE,
        NUMBER,
        IDENTIFIER,
        OPEN_PARENTHESIS,
        CLOSE_PARENTHESIS,
        OPEN_BRACKET,
        CLOSE_BRACKET,
        EOL // End of line
    }

    public Token( TokenType type, String value)
    {
        this.type = type;
        this.value = value;
    }

    private TokenType type;
    public String value;

    public TokenType getType()
    {
        return type;
    }

    @Override
    public String toString()
    {
        return value;
    }
}

Next the lexical analyzer proper.

import java.util.regex.Matcher;
import java.util.regex.Pattern;
import expressionparser.Token.TokenType;

class Tokenizer
{
    private final String input;
    private Matcher matcher;
    private Token lastToken;

    public Tokenizer( String input )
    {
        this.input = input;
        Pattern pattern =
                Pattern.compile( "[+]|-|[*]|/|[0-9]+|[a-zA-Z]+|[(]|[)]|$$|$$|[ ]");
        matcher = pattern.matcher( input );
        lastToken = null;
    }

    public Token readAndComsumeToken( TokenType type ) throws ExpressionException
    {
        Token result = readToken();
        if ( result.getType()!=type )
            throw new ExpressionException("Erroneous exception" );
        lastToken = null;
        return result;
    }

    public Token readToken() throws ExpressionException
    {
        if (lastToken!=null)
            return lastToken;

        String value;
        // Read till a non blank is received
        do
        {
            if ( matcher==null )
            {
                lastToken = new Token(TokenType.EOL, "");
                return lastToken;
            }
            if ( !matcher.find() )
              throw new ExpressionException("Erroneous exception" );
            value = matcher.group();
            if ( matcher.end() >= input.length() )
            {
                // End of String
                matcher = null;
            }
            if ( value.length()==0 )
              throw new ExpressionException("Erroneous exception" );
        } while ( value.equals( " " ) ||
                 value.equals("\t") || value.equals("\n") ||
                 value.equals("\f") || value.equals("\r") );
        // Identify read token
        TokenType type;
        if ( value.equals("+") )
            type = TokenType.PLUS;
        else if ( value.equals("-") )
            type = TokenType.MINUS;
        else if ( value.equals("*") )
            type = TokenType.MULTIPLY;
        else if ( value.equals("/") )
            type = TokenType.DIVIDE;
        else if ( value.equals("(") )
            type = TokenType.OPEN_PARENTHESIS;
        else if ( value.equals(")") )
            type = TokenType.CLOSE_PARENTHESIS;
        else if ( value.equals("[") )
            type = TokenType.OPEN_BRACKET;
        else if ( value.equals("]") )
            type = TokenType.CLOSE_BRACKET;
        else
        {
            char firstChar = value.charAt(0);
            if ( firstChar>='0' && firstChar<='9' )
                type = TokenType.NUMBER;
            else
                type = TokenType.IDENTIFIER;
        }
        lastToken = new Token( type, value );
        return lastToken;
    }

    public void consumeToken() throws IllegalStateException
    {
        if ( lastToken==null )
            throw new IllegalStateException();
        lastToken = null;
    }
}

Tokenizer is based on Pattern a standard java class which can break up strings as dictated by regular expressions.
The constructor creates a Pattern matcher for the input expression.
The Pattern has an extra regular expression [ ] not included in the previous list of tokens. This reads white spaces but they are not turned into tokens by Tokenizer, they are ignored.

readToken gets the next token from the matcher ignoring white spaces. It saves it in lastToken because the syntactical parser may need to call several times readToken for the same token and readToken needs to always return the same token until it is consumed. If there was already a not consumed token from a previous call it is returned without getting a new one.

consumeToken consumes the last read token by setting lastToken to null. It is an error to call it without having called before readToken.

readAndComsumeToken is a convenience method. In some cases the syntactical parser will know what the next token is before reading it. In that case it needs to read it, verify that it is the expected one and consume it. readAndConsumeToken does it in a single call.

Syntactical parser.

For the syntactic parser first we create a context-free grammar which defines the input language.
I will use Backus-Naur Form for it.

    ::=  EOL
         ::=  
            ::=  
             ::= OPEN_PARENTHESIS  CLOSE PARENTHESIS |
                         NUMBER |
                         IDENTIFIER 
            ::= PLUS   |
                         MINUS   |
                         ""
 ::= MULTIPLY   |
                         DIVIDE   |
                         ""
          ::= OPEN_BRACKET  CLOSE_BRACKET  |
                         ""

There are several ways to implement such grammar. And there are even tools to do it automatically.
I will implement it using a Java method for each non-terminal of the grammar. It is presented in the semantical translator since that is interwoven with the parser.

Semantical translator.

ExpressionParser is the syntactical parser and the semantical translator combined. It uses Tokenizer for doing the lexical analysis.

import expressionparser.Token.TokenType;
import java.util.List;

public class ExpressionParser
{
    private Tokenizer lex;
    private List scalars;

    public String compile( String expression, List scalars ) throws ExpressionException
    {
        lex = new Tokenizer(expression);
        this.scalars = scalars;
        return evalWholeExpression();
    }

    private String evalWholeExpression( ) throws ExpressionException
    {
        String result = evalExpression();
        lex.readAndComsumeToken(Token.TokenType.EOL);
        return result;
    }

    private String evalExpression() throws ExpressionException
    {
        String left = evalSummand();
        return evalTailSum( left );
    }

    private String evalSummand( ) throws ExpressionException
    {
        String left = evalFactor();
        return evalTailMultiplication( left );
    }

    private String evalFactor() throws ExpressionException
    {
        Token token = lex.readToken();
        if ( token.getType() == TokenType.OPEN_PARENTHESIS )
        {
            lex.consumeToken();
            String result = evalExpression();
            lex.readAndComsumeToken(TokenType.CLOSE_PARENTHESIS);
            return "(" + result + ")";
        }
        else if ( token.getType() == TokenType.NUMBER )
        {
            lex.consumeToken();
            return token.toString();
        }
        else if ( token.getType()==TokenType.IDENTIFIER )
        {
            lex.consumeToken();
            String tailArray = evalTailArray();
            if ( "".equals(tailArray) )
            {
                String scalarValue = evaluateScalar( token.toString() );
                return scalarValue;
            }
            else
            {
                verifyIsNotScalar( token.toString() );
                return token + tailArray;
            }
        }
        else
            throw new ExpressionException( "Incorrect expression" );
    }

    private String evalTailSum( String left ) throws ExpressionException
    {
        Token token = lex.readToken();
        if ( token.getType()==TokenType.PLUS )
        {
            lex.consumeToken();
            String right = evalSummand();
            return evalTailSum( left + "+" + right );
        }
        else if ( token.getType()==TokenType.MINUS )
        {
            lex.consumeToken();
            String right = evalSummand();
            return evalTailSum( left + "-" + right );
        }
        else
            return left;
    }

    private String evalTailMultiplication( String left ) throws ExpressionException
    {
        Token token = lex.readToken();
        if ( token.getType()==TokenType.MULTIPLY )
        {
            lex.consumeToken();
            String right = evalFactor();
            return evalTailSum( left + "*" + right );
        }
        else if ( token.getType()==TokenType.DIVIDE )
        {
            lex.consumeToken();
            String right = evalFactor();
            return evalTailSum( left + "/" + right );
        }
        else
            return left;
    }

    private String evalTailArray() throws ExpressionException
    {
        Token token = lex.readToken();
        if ( token.getType() == TokenType.OPEN_BRACKET )
        {
            lex.consumeToken();
            String result = evalExpression();
            lex.readAndComsumeToken(TokenType.CLOSE_BRACKET);
            return "[" + result + "]" + evalTailArray();
        }
        else
            return "";
    }

    private String evaluateScalar( String text ) throws ExpressionException
    {
        assert text!=null;
        for ( Scalar s : scalars )
        {
            if ( text.equals( s.identifier ) )
                return "" + s.value;
        }
        throw new ExpressionException( "Incorrect expression" );
    }

    private void verifyIsNotScalar( String text ) throws ExpressionException
    {
        assert text!=null;
        for ( Scalar s : scalars )
        {
            if ( text.equals( s.identifier ) )
                throw new ExpressionException( "Incorrect expression" );
        }
    }

}

compile creates a Tokenizer lexical analyzer for the input and saves the list of Scalars. It then calls evalWholeExpression, which returns the translation of the input, and returns such translation.
If the input is not valid an ExpressionException is thrown. Such exception would contain the error message. Which I have not made an effort to make useful. Dealing with error messages is a whole subject itself.
There is a method eval for each non-terminal of our context-free grammar.
The grammar has been composed in such a way that examining the 1st token is enough to determine which branch to take for non-terminals with several options. This makes easy creating the eval methods.

In example, consider evalTailSum. This method implements this rule :

            ::= PLUS   |
                         MINUS   |
                         ""

Which is used by itself and by this rule :

::=

evalExpression calls evalSummand and gets the translation of the left side of a possible summation ( or substraction ). Then it calls evalTailSum passing it this translation; finally returning the result of it.

has three branches.

One for sumation which always starts with SUM token.
One for substraction which always starts with MINUS token.
Otherwise the empty string. Which also happens for a fixed set of possible tokens. But there is no need to deal with this here.

So evalTailSum gets the next token with lex.readToken(). And then...

If it is PLUS it consumes it. Calls evalSummand storing its result as right. And calls 'evalTailSum' passing to it as left value left + "+" + right. Notice how the sequence of token consumption and calls matches the terminals and non-terminals in the right side of the rule.
If it is MINUS it follows a similar procedure to the PLUS case.
Otherwise the empty string branch is taken and the left translation received is returned as the result.

The same schema is followed to create a method for each non-terminal in the context-free grammar.

All those methods are not actually translating much at all. They produce as output the same as they received from input.
But that is not the case for one important method : evalFactor
When it deals with the IDENTIFIER branch we are in the place where we need to translate variables into values as specified by a list of Scalars.

Implementing Scalar is simple :

public class Scalar
{
    public Scalar( String identifier, int value )
    {
        this.identifier = identifier;
        this.value = value;
    }
    public final String identifier;
    public final int value;
}

In the IDENTIFIER branch evalFactor consumes the identifier and calls evalTailArray as per the rule in the context-free grammar.
evalTailArray returns an empty string if there are no array subscrits, so IDENTIFIER must be a scalar variable. In this case it calls evaluateScalar which searchs for that scalar in the list of scalars and returns its value so that it is translated, or if not found it will throw an ExpressionException.
If the result of evalTailArray is not empty it must be an array. We do no translation here as per OP. But we do verify that there is no scalar with that name by calling verifyIsNotScalar. If such verification fails this would be a semantic error (and so is if we have a non listed non-array variable), similar to the "undefined variable" or "incompatible type" errors you may get from a C++ or Java compiler. While all other errors thrown by our compiler would be syntactic errors similar to "syntax error : ; expected" or lexical errors like "illegal character @".

Using it

This small program uses the previous parser and tests it with several expressions. The last one being an incorrect expression.

import java.util.ArrayList;
import java.util.List;

public class Main
{
    public static void main(String[] args)
    {
        try
        {
            final List scalars = new ArrayList<>();
            scalars.add( new Scalar( "a", 1 ) );
            scalars.add( new Scalar( "b", 2 ) );
            scalars.add( new Scalar( "d", 3 ) );
            scalars.add( new Scalar( "varx", 5  ) );

            ExpressionParser parser = new ExpressionParser();

            System.out.println( parser.compile("a - (b+A[B[2]])*d + 3", scalars));
            System.out.println( parser.compile("5*(2+varx)+a", scalars));
            System.out.println( parser.compile("B[a*384+(5+(5*(varx+3)))]+varx", scalars));
            System.out.println( parser.compile("34+", scalars ) );
        }
        catch (ExpressionException ex)
        {
            System.out.println( ex.getMessage() );
        }
    }
}

Output is as expected :

1-(2+A[B[2]])*3+3
5*(2+5)+1
B[1*384+(5+(5*(5+3)))]+5
Incorrect expression

Replacing various variables in expression with values?

Answers (1)

Related Questions