olaf
olaf

Reputation: 347

How to have at least once guarantee for fanout in beam with flink runner?

I have a pipeline in beam java using flink

It run as follow

1. kafka topic (event_type1)
2. fanout (1 event_type1 map to 1000 event_type2)
3. repartition
4. process event_type2 
5. notify event_type1 finish

The problem is that I need all event_type2 spawn by the same event_type1 to finish before announcing that certain event_type1 is finished

For example

event_type1:id1
    event_type2:id_a:from_id1
    event_type2:id_b:from_id1

event_type1:id2
    event_type2:id_a:from_id2
    event_type2:id_b:from_id2

Before announcing event_type1:id1 is done, I need to finish process all two events from fanout.

Is there a canonical way of doing that using flink+beam?

I am thinking of sending a new event to last step called total_fanout_count which will contain

event_type1_id
count

and when event_type2 is processed, it will emit count of 1 for event_type1 with same id, so only when the total sum count matches between the two stream will we notify that a particular even_type1 is done.

For example

1. kafka topic (event_type1)
2. fanout (1 event_type1 map to 1000 event_type2)
    a. send fanout to topic fanout
    b. send total fanout count to topic fanout_count
3. repartition
4. process event_type2
    a. send count of 1 for event_type1 with its id to topic fanout_sum
5. join fanout_sum and fanout_count
    a. if fanout_sum and fanout_count is same send complete event

But the above way is not idempotent, because one event might be reprocessed multiple time, how to ensure all unique event_type2 per event_type1 is been processed?

Upvotes: 0

Views: 23

Answers (0)

Related Questions