user1021712
user1021712

Reputation: 335

What does the pipeline "state" mean in DataFlow?

I am a beginner in Dataflow. There is a concept I'm not sure I understand and this is the "state".

When talking about the pipeline state, does it mean the data in the pipeline ? For example, when taking a DataFlow snapshot, the documentation says there are two options:

  1. Take a snapshot only for the pipeline state in DataFlow.
  2. Take a snapshot as described in 1, plus a snapshot of the pub/sub source.

The documentatin

Does the state in section 1 mean the pipeline itself (the DAG) and the data in flight ? What does the "state" mean ? And if the data in flight is saved then why do we also need to take a snapshot of the source ?

Thank you

Guy

Upvotes: 1

Views: 79

Answers (1)

ningk
ningk

Reputation: 1383

Yes, it means the running pipeline and data inflight. With the snapshot, you can recreate the state of the running job with a newer versioned pipeline. It's basically updating a streaming job without draining.

The snapshot of the source is specifically for Pub/Sub so that when reading from the existing subscription, it knows the ack state of inflight messages.

Upvotes: 2

Related Questions