Reputation: 155
I would like to know if there is any way to transfer information between the parallel executions within one operator in Apache Flink during runtime? I just have to send a little messages. Broadcast variable in Flink does not work because it cannot broadcast during runtime.
Upvotes: 0
Views: 480
Reputation: 9245
Because Flink doesn't yet support both broadcast and keyed streams going to the same function (what is being called "side-input support"), it becomes a bit tricky to do what you want without external shared state. But I think it's possible, with iterations.
E.g. for k-means you have <point, centroid id>
as your tuple, where input values have null for the centroid id. Partition by point to a custom FlatMap that outputs <point, centroid id, distance>
. Then repartition by point and find the closest centroid. Output <point, centroid id>
, then repartition by centroid id and process with custom Map that keeps track of centroid center point. Emit updated <center point, centroid id>
and have this iterate back up to the top.
So the first custom FlatMap needs to handle a real point (with null centroid id) and a centroid (centroid point, centroid id) differently, where it adds centroids to its state, so it can calculate distances for any real points it receives.
Upvotes: 1