Sonali Bendre
Sonali Bendre

Reputation: 11

Global Aggregation in Flink over streaming data

I am currently writing an aggregation use case using Flink 1.0, as part of the use case I need to get count of api's that were logged in last 10 mins.

This I can easily do using keyBy("api") and then apply window of 10 min and doe sum(count) operation.

But the problem is my data might come out of order so I need some way to get the count of api's across the 10 min window..

For example : If the same api log comes in 2 different windows, I should get a global count i.e 2 for it and not two separate records diaplaying count as 1 each for each window.

I also don't want incremental counts i.e each record with same key is displayed many times with count equal to the incremental value..

I want the record to be displayed once with a global count, something like updateStateByKey() in Spark.

Can we do that?

Upvotes: 1

Views: 1678

Answers (1)

Fabian Hueske
Fabian Hueske

Reputation: 18987

You should have a look at Flink's event-time feature which produces consistent results for out-of-order streams. Event-time means that Flink will process data depending on timestamps that are part of the events and not depending on the machines wall-clock time.

If you you event-time (with appropriate watermarks). Flink will use automatically handle events that arrive out-of-order.

Upvotes: 1

Related Questions