Wei Ma
Wei Ma

Reputation: 3165

Apache Flink Process Stream Multiple Times

I am trying to use Apache Flink to process a data stream using two different algorithms. My pseudo code is as follows:

env = getEnvironment();
DataStream<Event> inputStream = getInputStream();
// How to replicate the input stream?
Array[DataStream<Event>] inputStreams = inputStream.clone()

// apply different operations on the replicated streams
outputOne = inputStreams[0].map(func1);
outputTwo = inputStreams[1].map(func2);
 ...
outputOne.addSink(sink1);
outputTwo.addSink(sink2);
env.execute();

I did some research with Flink documentation. It seems there is no concept of cloning a stream. Neither DataStream.iterate() nor DataStream.split() are doing exactly what I want. Is there an alternative to creating a stream multiple times from its source? Thank you for your help.

Upvotes: 3

Views: 2560

Answers (1)

Fabian Hueske
Fabian Hueske

Reputation: 18987

"Cloning" a stream is quite simple and does not require a dedicated operator. You can just apply multiple transformation on the same DataStream. All downstream transformations will consume the complete stream.

So in your example you do:

env = getEnvironment();
DataStream<Event> inputStream = getInputStream();

outputOne = inputStream.map(func1); // apply 1st transformation
outputTwo = inputStream.map(func2); // apply 2nd transformation
...
outputOne.addSink(sink1);
outputTwo.addSink(sink2);
env.execute();

Upvotes: 15

Related Questions