Reputation: 21

Faster method of evaluating a boolean expression as a string in Python

I have been working on this project for a couple months right now. The ultimate goal of this project is to evaluate an entire digital logic circuit similar to functional testing; just to give a scope of the problem. The topic I created here deals with the issue I'm having with performance of analyzing a boolean expression. For any gate inside a digital circuit, it has an output expression in terms of the global inputs. EX: ((A&B)|(C&D)^E). What I want to do with this expression is then calculate all possible outcomes and determine how much influence each input has on the outcome.

The fastest way that I have found was by building a truth table as a matrix and looking at certain rows (won't go into specifics of that algorithm as it's offtopic), the problem with that is once the number of unique inputs goes above 26-27 (something around that) the memory usage is well beyond 16GB (Max my computer has). You might say "Buy more RAM", but as every increase in inputs by 1, memory usage doubles. Some of the expressions I analyze are well over 200 unique inputs...

The method I use right now uses the compile method to take the expression as the string. Then I create an array with all of the inputs found from the compile method. Then I generate a list row by row of "True" and "False" randomly chosen from a sample of possible values (that way it will be equivalent to rows in a truth table if the sample size is the same size as the range and it will allow me to limit the sample size when things get too long to calculate). These values are then zipped with the input names and used to evaluate the expression. This will give the initial result, after that I go column by column in the random boolean list and flip the boolean then zip it with the inputs again and evaluate it again to determine if the result changed.

So my question is this: Is there a faster way? I have included the code that performs the work. I have tried regular expressions to find and replace but it is always slower (from what I've seen). Take into account that the inner for loop will run N times where N is the number of unique inputs. The outside for loop I limit to run 2^15 if N > 15. So this turns into eval being executed Min(2^N, 2^15) * (1 + N)...

As an update to clarify what I am asking exactly (Sorry for any confusion). The algorithm/logic for calculating what I need is not the issue. I am asking for an alternative to the python built-in 'eval' that will perform the same thing faster. (take a string in the format of a boolean expression, replace the variables in the string with the values in the dictionary and then evaluate the string).

#value is expression as string
comp = compile(value.strip(), '-', 'eval')
inputs = comp.co_names
control = [0]*len(inputs)

#Sequences of random boolean values to be used
random_list = gen_rand_bits(len(inputs))


for row in random_list:
    valuedict = dict(zip(inputs, row))
    answer = eval(comp, valuedict)

    for column in range(len(row)):
        row[column] = ~row[column]

        newvaluedict = dict(zip(inputs, row))
        newanswer = eval(comp, newvaluedict)

        row[column] = ~row[column]

        if answer != newanswer:
            control[column] = control[column] + 1

Upvotes: 2

Answers (3)

poke

Reputation: 388123

My question:

Just to make sure that I understand this correctly: Your actual problem is to determine the relative influence of each variable within a boolean expression on the outcome of said expression?

OP answered:

That is what I am calculating but my problem is not with how I calculate it logically but with my use of the python eval built-in to perform evaluating.

So, this seems to be a classic XY problem. You have an actual problem which is to determine the relative influence of each variable within the a boolean expression. You have attempted to solve this in a rather ineffective way, and now that you actually “feel” the inefficiency (in both memory usage and run time), you look for ways to improve your solution instead of looking for better ways to solve your original problem.

In any way, let’s first look at how you are trying to solve this. I’m not exactly sure what gen_rand_bits is supposed to do, so I can’t really take that into account. But still, you are essentially trying out every possible combination of variable assignments and see if flipping the value for a single variable changes the outcome of the formula result. “Luckily”, these are just boolean variables, so you are “only” looking at 2^N possible combinations. This means you have exponential run time. Now, O(2^N) algorithms are in theory very very bad, while in practice it’s often somewhat okay to use them (because most have an acceptable average case and execute fast enough). However, being an exhaustive algorithm, you actually have to look at every single combination and can’t shortcut. Plus the compilation and value evaluation using Python’s eval is apparently not so fast to make the inefficient algorithm acceptable.

So, we should look for a different solution. When looking at your solution, one might say that more efficient is not really possible, but when looking at the original problem, we can argue otherwise.

You essentially want to do things similar to what compilers do as static analysis. You want to look at the source code and analyze it just from there without having to actually evaluate that. As the language you are analyzing is highly restricted (being only a boolean expression with very few operators), this isn’t really that hard.

Code analysis usually works on the abstract syntax tree (or an augmented version of that). Python offers code analysis and abstract syntax tree generation with its ast module. We can use this to parse the expression and get the AST. Then based on the tree, we can analyze how relevant each part of an expression is for the whole.

Now, evaluating the relevance of each variable can get quite complicated, but you can do it all by analyzing the syntax tree. I will show you a simple evaluation that supports all boolean operators but will not further check the semantic influence of expressions:

import ast

class ExpressionEvaluator:
    def __init__ (self, rawExpression):
        self.raw = rawExpression
        self.ast = ast.parse(rawExpression)

    def run (self):
        return self.evaluate(self.ast.body[0])

    def evaluate (self, expr):
        if isinstance(expr, ast.Expr):
            return self.evaluate(expr.value)
        elif isinstance(expr, ast.Name):
            return self.evaluateName(expr)
        elif isinstance(expr, ast.UnaryOp):
            if isinstance(expr.op, ast.Invert):
                return self.evaluateInvert(expr)
            else:
                raise Exception('Unknown unary operation {}'.format(expr.op))
        elif isinstance(expr, ast.BinOp):
            if isinstance(expr.op, ast.BitOr):
                return self.evaluateBitOr(expr.left, expr.right)
            elif isinstance(expr.op, ast.BitAnd):
                return self.evaluateBitAnd(expr.left, expr.right)
            elif isinstance(expr.op, ast.BitXor):
                return self.evaluateBitXor(expr.left, expr.right)
            else:
                raise Exception('Unknown binary operation {}'.format(expr.op))
        else:
            raise Exception('Unknown expression {}'.format(expr))

    def evaluateName (self, expr):
        return { expr.id: 1 }

    def evaluateInvert (self, expr):
        return self.evaluate(expr.operand)

    def evaluateBitOr (self, left, right):
        return self.join(self.evaluate(left), .5, self.evaluate(right), .5)

    def evaluateBitAnd (self, left, right):
        return self.join(self.evaluate(left), .5, self.evaluate(right), .5)

    def evaluateBitXor (self, left, right):
        return self.join(self.evaluate(left), .5, self.evaluate(right), .5)

    def join (self, a, ratioA, b, ratioB):
        d = { k: v * ratioA for k, v in a.items() }
        for k, v in b.items():
            if k in d:
                d[k] += v * ratioB
            else:
                d[k] = v * ratioB
        return d

expr = '((A&B)|(C&D)^~E)'
ee = ExpressionEvaluator(expr)
print(ee.run())
# > {'A': 0.25, 'C': 0.125, 'B': 0.25, 'E': 0.25, 'D': 0.125}

This implementation will essentially generate a plain AST for the given expression and the recursively walk through the tree and evaluate the different operators. The big evaluate method just delegates the work to the type specific methods below; it’s similar to what ast.NodeVisitor does except that we return the analyzation results from each node here. One could augment the nodes instead of returning it instead though.

In this case, the evaluation is just based on ocurrence in the expression. I don’t explicitely check for semantic effects. So for an expression A | (A & B), I get {'A': 0.75, 'B': 0.25}, although one could argue that semantically B has no relevance at all to the result (making it {'A': 1} instead). This is however something I’ll leave for you. As of now, every binary operation is handled identically (each operand getting a relevance of 50%), but that can be of course adjusted to introduce some semantic rules.

In any way, it will not be necessary to actually test variable assignments.

Upvotes: 5

timothy

Reputation: 4487

You don't have to prepare a static table for computing this. Python is a dynamic language, thus it's able to interpret and run a code by itself during runtime.

In you case, I would suggest a soluation that:

import random, re, time

#Step 1: Input your expression as a string
logic_exp = "A|B&(C|D)&E|(F|G|H&(I&J|K|(L&M|N&O|P|Q&R|S)&T)|U&V|W&X&Y)"

#Step 2: Retrieve all the variable names.
#        You can design a rule for naming, and use regex to retrieve them.
#        Here for example, I consider all the single-cap-lettler are variables.
name_regex = re.compile(r"[A-Z]")

#Step 3: Replace each variable with its value. 
#        You could get the value with reading files or keyboard input.
#        Here for example I just use random 0 or 1.
for name in name_regex.findall(logic_exp):
    logic_exp = logic_exp.replace(name, str(random.randrange(2)))

#Step 4: Replace the operators. Python use 'and', 'or' instead of '&', '|' 
logic_exp = logic_exp.replace("&", " and ")
logic_exp = logic_exp.replace("|", " or " )    


#Step 5: interpret the expression with eval(exp) and output its value.
print "exporession =", logic_exp  
print "expression output =",eval(logic_exp)

This would be very fast and take very little memory. For a test, I run the example above with 25 input variables:

exporession = 1 or 1 and (1 or 1) and 0 or (0 or 0 or 1 and (1 and 0 or 0 or (0 and 0 or 0 and 0 or 1 or 0 and 0 or 0) and 1) or 0 and 1 or 0 and 1 and 0)
expression output= 1
computing time: 0.000158071517944 seconds

According to your comment, I see that you are computing all the possible combinations instead of the output at a given input values. If so, it would become a typical NP-complete Boolean satisfiability problem. I don't think there's any algorithm that could make it by a complexity lower than O(2^N). I suggest you to search with the keywords fast algorithm to solve SAT problem, you would find a lot of interesting things.

Upvotes: 0

Abhijit

Reputation: 63767

Instead of reinventing the wheel and getting into risk like performance and security which you are already in, it is better to search for industry ready well accepted libraries.

Logic Module of sympy would do the exact thing that you want to achieve without resorting to evil ohh I meant eval. More importantly, as the boolean expression is not a string you don;t have to care about parsing the expression which generally turns out to be the bottleneck.

Upvotes: 0

Faster method of evaluating a boolean expression as a string in Python

Answers (3)

Related Questions