Reputation: 720
I am currently using the stopit library https://github.com/glenfant/stopit to set per element processing timeouts in batch jobs. These jobs work on the direct runner and I am able to timeout functions that take too long.
What is the beam way of setting a per element process timeout for a batch job?
Is there a way I could set a processing timeout with a trigger for a dataflow batch job?
My use case is extracting named entities from a text. The NER process sometimes takes too long if the document being processed is too long.
It would be nice to get rid of this dependency and move to a beam native solution.
Upvotes: 1
Views: 2310
Reputation: 507
As per my understanding the answer to the question is timely processing which also maintain state. Lets say you have a function f and get the output for any batch that will use the function. So basically we have to mark the batch(update a state) after we receive a batch and we will set timers that will update the output as per the watermark/expiry timing which can be set, if there is any output we will receive it and if we don't any output from the function as per your query, we can surely re-route that batch.
This is not an exact solution, but can be worked out
For better understanding you can read it here: apache beam documentation
Upvotes: 2