Roger Sanchez
Roger Sanchez

Reputation: 311

Evaluating nested variables with pyparsing for a DSL

I’ve been working on building out my DSL with pyparsing and have made excellent progress. My first milestone was to evaluate expressions that contain arithmetic operators, database field references and a set of functions (Avg, Stdev, etc). In addition, I implemented assignment of expressions to variables so as to be able to build up complex expressions in a modular way. So far so good.

I have now hit my next major snag when trying to calculation functions on variables as arguments. Specifically, my database references (which is the building block on which calcs are performed) requires specifiying a Person as a dimension of the query. I don’t know the best way to force re-evaluation of the expressions assigned to these variables when they are contained within a function. Specific example that has problems:

1) CustomAvg = Avg[Height] + Avg[Weight]
2) Avg[CustomAvg]

Evaluating statement 2 does not work as expected across a list of People because CustomAvg is being resolved to a constant value.

In these scenarios, I have a list of People that I iterate over to calculate the components of CustomAvg. However, when I evaluate Avg[CustomAvg] the value of CustomAvg is coming from my variable lookup dict rather than being evaluated, so effectively I am iterating over a constant value. What is the best way to introduce ‘awareness’ in my evaluation so that the variables used as arguments within a function a re-evaluated rather than sourced from the lookup table? Here is streamlined relevant code:

class EvalConstant(object):
    var_ = {}
    def __init__(self, tokens):
        self.value = tokens[0]

    def eval(self):
        v = self.value
        if self.var_.has_key(v):
            return self.var_[v]
        else:
            return float(v)

class EvalDBref(object):
    person_ = None
    def __init__(self, tokens):
        self.value = tokens[0]

    def eval(self):
        v = self.value
        fieldRef = v.split(':')
        source = fieldRef[0]
        field = fieldRef[1] 
        rec = db[source].find_one({'Name' : self.person_}, { '_id' : 0, field : 1})
        return rec[field]

class EvalFunction(object):
    pop_ = {}
    def __init__(self, tokens):
        self.func_ = tokens.funcname
        self.field_ = tokens.arg
        self.pop_ = POPULATION

    def eval(self):
        v = self.field_.value
        fieldRef = v.split(':')
        source = fieldRef[0]
        field = fieldRef[1]

        val = self.field_.eval()

        if self.func_ == 'ZS':
            # If using zscore then fetch the field aggregates from stats
            rec = db['Stats'].find_one({'_id' : field})   
            stdev = rec['value']['stddev']          
            avg = rec['value']['avg']
            return (val - avg)/stdev
        elif self.func_ == 'Ptile':
            recs = list(db[source].find({'Name' : { '$in' : self.pop_}},{'_id' : 0, field : 1}))
            recs = [r[field] for r in recs]
            return percentileofscore(recs, val)

def assign_var(tokens):
    ev = tokens.varvalue.eval()
    EvalConstant.var_[tokens.varname] = ev

#--------------------
expr = Forward()
chars = Word(alphanums + "_-/") 
integer = Word(nums)
real = Combine(Word(nums) + "." + Word(nums))
var = Word(alphas)

assign = var("varname") + "=" + expr("varvalue")
assign.setParseAction(assign_var)

dbRef = Combine(chars + OneOrMore(":") + chars)
dbRef.setParseAction(EvalDBref)

funcNames = Keyword("ZS") | Keyword("Avg") | Keyword("Stdev")
functionCall = funcNames("funcname") + "[" + expr("arg") + "]"
functionCall.setParseAction(EvalFunction)

operand =  dbRef | functionCall | (real | integer| var).setParseAction(EvalConstant) 

signop = oneOf('+ -')
multop = oneOf('* /')
plusop = oneOf('+ -')

expr << operatorPrecedence(operand,
   [
    (signop, 1, opAssoc.RIGHT, EvalSignOp),
    (multop, 2, opAssoc.LEFT, EvalMultOp),
    (plusop, 2, opAssoc.LEFT, EvalAddOp),
   ])

EvalDBref.person_ = ‘John Smith’
ret = (assign | expr).parseString(line)[0]
str(ret.eval())

Upvotes: 1

Views: 445

Answers (2)

PaulMcG
PaulMcG

Reputation: 63762

So in this expression:

CustomAvg = Avg[Height] + Avg[Weight]

Height and Weight are supposed to be evaluated immediately, but CustomAvg is supposed to be evaluated at some time in the future? It sounds then that this is more like a definition of a function or callable, not of a new constant. I think all you have to do is to change what happens in assign_var:

def assign_var(tokens):
    # ev = tokens.varvalue.eval()
    # EvalConstant.var_[tokens.varname] = ev
    EvalConstant.var_[tokens.varname] = tokens.varvalue

Now every assigned variable becomes not a constant value, but an eval'able expression, similar to creating a lambda in Python. Then EvalConstant.eval has to detect whether it can just pass back a value or if the value itself needs to be eval'ed:

def eval(self):
    v = self.value
    if v in self.var_:  # has_key is deprecated Python, use 'in'
        varval = self.var_[v]
        return varval.eval() if hasattr(varval,'eval') else varval
    else:
        return float(v)

If you don't always want this to happen, then I think you may need some new syntax to distinguish when you are assigning a constant vs. defining what is essentially a lambda, maybe something like:

CustomAvg = Avg[Height] + Avg[Weight]    # store as a constant
CustomAvg *= Avg[Height] + Avg[Weight]   # store as a callable

And change assign to:

assign = var("varname") + oneOf("= *=")("assign_op") + expr("varvalue")

And then assign_var becomes:

def assign_var(tokens):
    if tokens.assign_op == '*=':
        # store expression to be eval'ed later
        EvalConstant.var_[tokens.varname] = tokens.varvalue
    else:
        # eval now and save result
        EvalConstant.var_[tokens.varname] = tokens.varvalue.eval()

Upvotes: 1

Peter Rowell
Peter Rowell

Reputation: 17713

I think your problem is scoping. Arguments to a function are normally considered to be in the same scope as locals. So the statement CustomAvg = Avg[Height] + Avg[Weight] Avg[CustomAvg] should push the current value of the local CustomAvg on to the stack, evaluate the expression, and then store the results into CustomAvg. (Or set the name CustomAvg to point to the results, if you are using a Pythonic view of names/values.)

Since the assignment happens long after the value was pushed on the eval stack, there should not be any ambiguity.

Upvotes: 0

Related Questions