Weize Sun
Weize Sun

Reputation: 155

How to share information between parallel executions in Apache Flink?

I would like to know if there is any way to transfer information between the parallel executions within one operator in Apache Flink during runtime? I just have to send a little messages. Broadcast variable in Flink does not work because it cannot broadcast during runtime.

Upvotes: 0

Views: 480

Answers (1)

kkrugler
kkrugler

Reputation: 9245

Because Flink doesn't yet support both broadcast and keyed streams going to the same function (what is being called "side-input support"), it becomes a bit tricky to do what you want without external shared state. But I think it's possible, with iterations.

E.g. for k-means you have <point, centroid id> as your tuple, where input values have null for the centroid id. Partition by point to a custom FlatMap that outputs <point, centroid id, distance>. Then repartition by point and find the closest centroid. Output <point, centroid id>, then repartition by centroid id and process with custom Map that keeps track of centroid center point. Emit updated <center point, centroid id> and have this iterate back up to the top.

So the first custom FlatMap needs to handle a real point (with null centroid id) and a centroid (centroid point, centroid id) differently, where it adds centroids to its state, so it can calculate distances for any real points it receives.

Upvotes: 1

Related Questions