Reputation: 3165
I am trying to use Apache Flink to process a data stream using two different algorithms. My pseudo code is as follows:
env = getEnvironment();
DataStream<Event> inputStream = getInputStream();
// How to replicate the input stream?
Array[DataStream<Event>] inputStreams = inputStream.clone()
// apply different operations on the replicated streams
outputOne = inputStreams[0].map(func1);
outputTwo = inputStreams[1].map(func2);
...
outputOne.addSink(sink1);
outputTwo.addSink(sink2);
env.execute();
I did some research with Flink documentation. It seems there is no concept of cloning a stream. Neither DataStream.iterate() nor DataStream.split() are doing exactly what I want. Is there an alternative to creating a stream multiple times from its source? Thank you for your help.
Upvotes: 3
Views: 2560
Reputation: 18987
"Cloning" a stream is quite simple and does not require a dedicated operator. You can just apply multiple transformation on the same DataStream
. All downstream transformations will consume the complete stream.
So in your example you do:
env = getEnvironment();
DataStream<Event> inputStream = getInputStream();
outputOne = inputStream.map(func1); // apply 1st transformation
outputTwo = inputStream.map(func2); // apply 2nd transformation
...
outputOne.addSink(sink1);
outputTwo.addSink(sink2);
env.execute();
Upvotes: 15