Pub/Sub messages from snapshot not processed in a Dataflow streaming pipeline

Question

We have a Dataflow consumming from a Pub/Sub and writing into bigquery in streaming. Due to a permits issue the pipeline got stuck and the messages were not consumed, we re-started the pipeline, save the unacked messages in a snapshot, replay the messages but they are discarded

We fix the problem, re-deployed the pipeline with a new subscription to the topic and all the events are consumed in streaming without a problem
For all the unacked messages accumulated (20M) in the first subscription, we created a snapshot
This snapshot was then connected to the new subscription via the UI using Replay messages dialog
In the metrics dashboard we see that the unacked messages spike to 20M and then they get consumed subscription spike
But then the events are not sent to BigQuery, checking inside dataflow job metrics we are able to see a spike in the Duplicate message count within the read from pubsub step Dataflow Duplicate counter

The messages are < 3 days old, does anybody knows why this happen? Thanks in advance

The pipeline is using Apache Beam SDK 2.39.0 and python 3.9 with streming engine and v2 runner enable.

Pub/Sub messages from snapshot not processed in a Dataflow streaming pipeline

Answers (1)

Related Questions