Reputation: 123
We're running multiple streaming Dataflow pipelines that always eventually hang and need to be restarted after about 25 days of running.
Has anyone else see this?
Is there some sort of max time a pipeline can run for?
Is there any recommended best practices for restarting streaming jobs on a more frequent cadence, even if there are no code changes (i.e. should we be restarted the pipeline every 2 weeks? 1 week?)?
Upvotes: 1
Views: 442
Reputation: 324
Same thing happening here. We had a dataflow job hanging some 20 days ago, it was fetching data from pubsub, that has caused data loss for one of our customers.
Yesterday we rebooted the dataflow job and it is already stuck again. We run multiple copies of this job on several customer projects, all the other copies are running fine, that seems to indicate that there is some bug in gcp dataflow.
This dataflow job is running in us-east1 with Apache Beam SDK for Java 2.6.0
This issue seems related with https://status.cloud.google.com/incident/cloud-dataflow/19001
Any ideas on how to fix the hanging?
Regards
Upvotes: 1