Bharathi raja
Bharathi raja

Reputation: 105

Dataflow API retries several times after data format exception

I found this as a required improvement for dataflow API or I may be wrong.

I created a batch dataflow and by mistake one of the lines in my input file had invalid data format.

So the pipeline job gave DataFormatException. But instead of stopping the job then itself it retried several times ~4 times before stopping the job.

I see this as a wrong behavior. When a batch dataflow receives an invalid data format, it should stop the job then itself instead of retrying several times and then stopping the job. Ideas?

Upvotes: 2

Views: 2552

Answers (1)

Graham Polley
Graham Polley

Reputation: 14791

It seems like Dataflow is trying to build in some fault tolerance. That's a good thing. And this behaviour is clearly documented here ("How are Java exceptions handled in Dataflow?")

If you don't want this behaviour, just write your own exception handling code, and bail out if you don't want it to be retried.

Upvotes: 3

Related Questions