Reputation: 11
I am currently writing an aggregation use case using Flink 1.0, as part of the use case I need to get count of api's that were logged in last 10 mins.
This I can easily do using keyBy("api") and then apply window of 10 min and doe sum(count) operation.
But the problem is my data might come out of order so I need some way to get the count of api's across the 10 min window..
For example : If the same api log comes in 2 different windows, I should get a global count i.e 2 for it and not two separate records diaplaying count as 1 each for each window.
I also don't want incremental counts i.e each record with same key is displayed many times with count equal to the incremental value..
I want the record to be displayed once with a global count, something like updateStateByKey() in Spark.
Can we do that?
Upvotes: 1
Views: 1678
Reputation: 18987
You should have a look at Flink's event-time feature which produces consistent results for out-of-order streams. Event-time means that Flink will process data depending on timestamps that are part of the events and not depending on the machines wall-clock time.
If you you event-time (with appropriate watermarks). Flink will use automatically handle events that arrive out-of-order.
Upvotes: 1