Reputation: 347
I have a pipeline in beam java using flink
It run as follow
1. kafka topic (event_type1)
2. fanout (1 event_type1 map to 1000 event_type2)
3. repartition
4. process event_type2
5. notify event_type1 finish
The problem is that I need all event_type2
spawn by the same event_type1
to finish before announcing that certain event_type1
is finished
For example
event_type1:id1
event_type2:id_a:from_id1
event_type2:id_b:from_id1
event_type1:id2
event_type2:id_a:from_id2
event_type2:id_b:from_id2
Before announcing event_type1:id1
is done, I need to finish process all two events from fanout.
Is there a canonical way of doing that using flink+beam?
I am thinking of sending a new event to last step called total_fanout_count which will contain
event_type1_id
count
and when event_type2 is processed, it will emit count of 1 for event_type1 with same id, so only when the total sum count matches between the two stream will we notify that a particular even_type1 is done.
For example
1. kafka topic (event_type1)
2. fanout (1 event_type1 map to 1000 event_type2)
a. send fanout to topic fanout
b. send total fanout count to topic fanout_count
3. repartition
4. process event_type2
a. send count of 1 for event_type1 with its id to topic fanout_sum
5. join fanout_sum and fanout_count
a. if fanout_sum and fanout_count is same send complete event
But the above way is not idempotent, because one event might be reprocessed multiple time, how to ensure all unique event_type2
per event_type1
is been processed?
Upvotes: 0
Views: 23