Achyuth Samudrala
Achyuth Samudrala

Reputation: 31

Process elements after sinking to Destination

I am setting up a flink pipeline that reads from Kafka and sinks to HDFS. I want to process the elements after the addSink() step. This is because I want to setup trigger files indicating that writing data (to the sink) for a certain partition/hour is complete. How can this be achieved? Currently I am using the Bucketing sink.

  1. DataStream messageStream = env .addSource(flinkKafkaConsumer011);

  2. //some aggregations to convert message stream to keyedStream

  3. keyedStream.addSink(sink);

//How to process elements after 3.?

Upvotes: 2

Views: 1869

Answers (1)

David Anderson
David Anderson

Reputation: 43524

The Flink APIs do not support extending the job graph beyond the sink(s). (You can, however, fork the stream and do additional processing in parallel with writing to the sink.)

With the Streaming File Sink you can observe the part files transition to the finished state when they complete. See the JavaDoc for more information.

State lives within a single operator -- only that operator (e.g., a ProcessFunction) can modify it. If you want to modify the keyed value state after the sink has completed, there's no straightforward way to do that. One idea would be to add a processing time timer in the ProcessFunction that has the keyed state that wakes up periodically and checks for newly finished part files, and based on their existence, modifies the state. Or if that's the wrong granularity, write a custom source that does something similar and streams or broadcasts information into the ProcessFunction (which will then have to be a CoProcessFunction or a KeyedBroadcastProcessFunction) that it can use to do the necessary state updates.

Upvotes: 1

Related Questions