Reputation: 105
I found this as a required improvement for dataflow API or I may be wrong.
I created a batch dataflow and by mistake one of the lines in my input file had invalid data format.
So the pipeline job gave DataFormatException. But instead of stopping the job then itself it retried several times ~4 times before stopping the job.
I see this as a wrong behavior. When a batch dataflow receives an invalid data format, it should stop the job then itself instead of retrying several times and then stopping the job. Ideas?
Upvotes: 2
Views: 2552
Reputation: 14791
It seems like Dataflow is trying to build in some fault tolerance. That's a good thing. And this behaviour is clearly documented here ("How are Java exceptions handled in Dataflow?")
If you don't want this behaviour, just write your own exception handling code, and bail out if you don't want it to be retried.
Upvotes: 3