Reputation: 122
I'm searching for the best way to merge multiple (>20) Flink streams that represent different origins of events in our system, All have the same type.
List<DataStream<Event>> dataStreams = ...
Where each object is a POJO (an abstract representation obviously)
public class Event implements Serializable {
public String userId;
public long eventTimestamp;
public String eventData;
}
I eventually want to end up with a single stream
DataStream<Event> merged;
There are different ways to manage that: join
, coGroup
, map
/flatMap
(using CoGroup
) & union
.
I'm not sure which of them will give me the quickest throughput of the events from the original streams to the merged one.
Moreover, Is there an operator that will be used on all streams at once or should I just call on each 2 streams at a time?
I'm looking to get one stream which then will be keyedBy
userId
field, does that make any difference?
On a side note, the next step is to 'sort' the events (in each window
) for each userId
by the eventTimestamp
to get a chronological order of events per such userId
.
Upvotes: 1
Views: 803
Reputation: 3864
If the events have the same type I would surely go with union
as it's the simplest form and the easiest one. Also, notice that the union takes vararg as the parameter, which basically means that You can join all the streams in one call.
Upvotes: 3