Reputation: 335
I am a beginner in Dataflow. There is a concept I'm not sure I understand and this is the "state".
When talking about the pipeline state, does it mean the data in the pipeline ? For example, when taking a DataFlow snapshot, the documentation says there are two options:
Does the state in section 1 mean the pipeline itself (the DAG) and the data in flight ? What does the "state" mean ? And if the data in flight is saved then why do we also need to take a snapshot of the source ?
Thank you
Guy
Upvotes: 1
Views: 79
Reputation: 1383
Yes, it means the running pipeline and data inflight. With the snapshot, you can recreate the state of the running job with a newer versioned pipeline. It's basically updating a streaming job without draining.
The snapshot of the source is specifically for Pub/Sub so that when reading from the existing subscription, it knows the ack state of inflight messages.
Upvotes: 2