Saving state of closure in Groovy

Question

I would like to use a Groovy closure to process data coming from a SQL table. For each new row, the computation would depend on what has been computed previously. However, new rows may become available on further runs of the application, so I would like to be able to reload the closure, initialised with the intermediate state it had when the closure was last executed in the previous run of the application.

For example, a closure intending to compute the moving average over 3 rows would be implemented like this:

def prev2Val = null
def prevVal = null
def prevId = null

Closure c = { row ->
    println([ prev2Val, prevVal, prevId])

    def latestVal = row['val']

    if (prev2Val != null) {
        def movMean = (prev2Val + prevVal + latestVal) / 3
        sql.execute("INSERT INTO output(id, val) VALUES (?, ?)", [prevId, movMean])
    }

    sql.execute("UPDATE test_data SET processed=TRUE WHERE id=?", [row['id']])

    prev2Val = prevVal
    prevVal = latestVal
    prevId = row['id']
}

test_data has 3 columns: id (auto-incremented primary key), value and processed. A moving mean is calculated based on the two previous values and inserted into the output table, against the id of the previous row. Processed rows are flagged with processed=TRUE.

If all the data was available from the start, this could be called like this:

sql.eachRow("SELECT id, val FROM test_data WHERE processed=FALSE ORDER BY id", c)

The problem comes when new rows become available after the application has already been run. This can be simulated by processing a small batch each time (e.g. using LIMIT 5 in the previous statement).

I would like to be able to dump the full state of the closure at the end of the execution of eachRow (saving the intermediate data somewhere in the database for example) and re-initialise it again when I re-run the whole application (by loading those intermediate variable from the database).

In this particular example, I can do this manually by storing the values of prev2Val, prevVal and prevId, but I'm looking for a generic solution where knowing exactly which variables are used wouldn't be necessary.

Perhaps something like c.getState() which would return [ prev2Val: 1, prevVal: 2, prevId: 6] (for example), and where I could use c.setState([ prev2Val: 1, prevVal: 2, prevId: 6]) next time the application is executed (if there is a state stored).

I would also need to exclude sql from the list. It seems this can be done using c.@sql=null.

I realise this is unlikely to work in the general case, but I'm looking for something sufficiently generic for most cases. I've tried to dehydrate, serialize and rehydrate the closure, as described in this Groovy issue, but I'm not sure how to save and store all the @ fields in a single operation.

Is this possible? Is there a better way to remember state between executions, assuming the list of variables used by the closure isn't necessarily known in advance?

tim_yates · Accepted Answer

Not sure this will work in the long run, and you might be better returning a list containing the values to pass to the closure to get the next set of data, but you can interrogate the binding of the closure.

Given:

def closure = { row ->
  a = 1
  b = 2
  c = 4
}

If you execute it:

closure( 1 )

You can then compose a function like:

def extractVarsFromClosure( Closure cl ) {
  cl.binding.variables.findAll { 
    !it.key.startsWith( '_' ) && it.key != 'args'
  }
}

Which when executed:

println extractVarsFromClosure( closure )

prints:

['a':1, 'b':2, 'c':4]

However, any 'free' variables defined in the local binding (without a def) will be in the closures binding too, so:

fish = 42
println extractVarsFromClosure( closure )

will print:

['a':1, 'b':2, 'c':4, 'fish':42]

But

def fish = 42
println extractVarsFromClosure( closure )

will not print the value fish

Saving state of closure in Groovy

Answers (1)

Related Questions