Reputation: 122719
I would like to use a Groovy closure to process data coming from a SQL table. For each new row, the computation would depend on what has been computed previously. However, new rows may become available on further runs of the application, so I would like to be able to reload the closure, initialised with the intermediate state it had when the closure was last executed in the previous run of the application.
For example, a closure intending to compute the moving average over 3 rows would be implemented like this:
def prev2Val = null
def prevVal = null
def prevId = null
Closure c = { row ->
println([ prev2Val, prevVal, prevId])
def latestVal = row['val']
if (prev2Val != null) {
def movMean = (prev2Val + prevVal + latestVal) / 3
sql.execute("INSERT INTO output(id, val) VALUES (?, ?)", [prevId, movMean])
}
sql.execute("UPDATE test_data SET processed=TRUE WHERE id=?", [row['id']])
prev2Val = prevVal
prevVal = latestVal
prevId = row['id']
}
test_data
has 3 columns: id
(auto-incremented primary key), value
and processed
. A moving mean is calculated based on the two previous values and inserted into the output
table, against the id
of the previous row. Processed rows are flagged with processed=TRUE
.
If all the data was available from the start, this could be called like this:
sql.eachRow("SELECT id, val FROM test_data WHERE processed=FALSE ORDER BY id", c)
The problem comes when new rows become available after the application has already been run. This can be simulated by processing a small batch each time (e.g. using LIMIT 5
in the previous statement).
I would like to be able to dump the full state of the closure at the end of the execution of eachRow
(saving the intermediate data somewhere in the database for example) and re-initialise it again when I re-run the whole application (by loading those intermediate variable from the database).
In this particular example, I can do this manually by storing the values of prev2Val
, prevVal
and prevId
, but I'm looking for a generic solution where knowing exactly which variables are used wouldn't be necessary.
Perhaps something like c.getState()
which would return [ prev2Val: 1, prevVal: 2, prevId: 6]
(for example), and where I could use c.setState([ prev2Val: 1, prevVal: 2, prevId: 6]
) next time the application is executed (if there is a state stored).
I would also need to exclude sql
from the list. It seems this can be done using c.@sql=null
.
I realise this is unlikely to work in the general case, but I'm looking for something sufficiently generic for most cases. I've tried to dehydrate
, serialize and rehydrate
the closure, as described in this Groovy issue, but I'm not sure how to save and store all the @
fields in a single operation.
Is this possible? Is there a better way to remember state between executions, assuming the list of variables used by the closure isn't necessarily known in advance?
Upvotes: 1
Views: 1205
Reputation: 171144
Not sure this will work in the long run, and you might be better returning a list containing the values to pass to the closure to get the next set of data, but you can interrogate the binding of the closure.
Given:
def closure = { row ->
a = 1
b = 2
c = 4
}
If you execute it:
closure( 1 )
You can then compose a function like:
def extractVarsFromClosure( Closure cl ) {
cl.binding.variables.findAll {
!it.key.startsWith( '_' ) && it.key != 'args'
}
}
Which when executed:
println extractVarsFromClosure( closure )
prints:
['a':1, 'b':2, 'c':4]
However, any 'free' variables defined in the local binding (without a def
) will be in the closures binding too, so:
fish = 42
println extractVarsFromClosure( closure )
will print:
['a':1, 'b':2, 'c':4, 'fish':42]
But
def fish = 42
println extractVarsFromClosure( closure )
will not print the value fish
Upvotes: 3