Yves
Yves

Reputation: 12431

How to merge two DataStreams in Apache Flink

I'm using Flink to process my streaming data.

I have two data sources: A and B.

// A
DataStream<String> dataA = env.addSource(sourceA);
// B
DataStream<String> dataB = env.addSource(sourceB);

I use map to process the data coming from A and B.

DataStream<String> res = mergeDataAAndDataB();   // how to merge dataA and dataB?

Saying that sourceA is sending: "aaa", "bbb", "ccc"..., sourceB is sending: "A", "B", "C"....

What I'm trying to do is to merge them as Aaaa, Bbbb, Cccc... to generate a new DataStream<String> object.

How to achieve this?

Upvotes: 4

Views: 4714

Answers (1)

David Anderson
David Anderson

Reputation: 43707

There are two kinds of stream merging in Flink.

dataA.union(dataB)

will create one new stream that has the elements of both streams, blended in some arbitrary way, perhaps "aaa", "bbb", "A", "ccc", "B", "C", which isn't what you've asked for -- just mentioning it for completeness.

What you do want is to create a connected stream, via

dataA.connect(dataB)

which you can then process with a RichCoFlatMapFunction or a KeyedCoProcessFunction to compute a sort of join that glues the strings together.

You'll find a tutorial on the topic of connected streams in the Flink documentation, and an example that's reasonably close in the training exercises that accompany the tutorials.

Upvotes: 5

Related Questions