Reputation: 493

Flink pipeline without a data sink with checkpointing on

I am researching on building a flink pipeline without a data sink. i.e my pipeline ends when it makes a successful api call to a datastore.

In that case if we don't use a sink operator how will checkpointing work ?

As checkpointing is based on the concept of pre-checkpoint epoch (all events that are persisted in state or emitted into sinks) and a post-checkpoint epoch. Is having a sink required for a flink pipeline?

Upvotes: 6

Answers (1)

Yuval Itzchakov

Reputation: 149636

Yes, sinks are required as part of Flink's execution model:

DataStream programs in Flink are regular programs that implement transformations on data streams (e.g., filtering, updating state, defining windows, aggregating). The data streams are initially created from various sources (e.g., message queues, socket streams, files). Results are returned via sinks, which may for example write the data to files, or to standard output (for example the command line terminal)

One could argue that your that the call to your datastore is the actual sink implementation that you could use. You could define your own sink and execute the datastore call there.

I am not keen on the details of your datastore, but one could assume that you are serializing these events and sending them to the datastore in some way. In that case, you could flow all your elements to the sink operator, and store each of these elements in some ListState which you can continuously offload and send. This way, if your application needs to be upgraded, in flight records will not be lost and will be recovered and sent once the job has restored.

Upvotes: 4

Flink pipeline without a data sink with checkpointing on

Answers (1)

Related Questions