Daniel N.
Daniel N.

Reputation: 93

apache flink DataSet window aggregation

I'm using the DataSet API in order to read numerous files and throw them into cassandra.

In one of the steps, I'm doing a lot of HTTP requests and I would like to know how many requests per second I'm sending.

With Stream API it's pretty straight forward using a sliding window but how do you do the same with DataSet API?

Upvotes: 0

Views: 230

Answers (2)

Chris
Chris

Reputation: 33

If you assign the timestamp of the requests to the objects in your DataSet, you can use the .groupBy() transformation with a key extractor that extracts the window identifier (e.g., the second parsed from the timestamp) and then apply a count aggregation.

For example, a count aggregation can be done by extending the tuple with a Integer 1 using with a .map() transformation and summarizing the Integers with the .sum() transformation after the groupBy.

Upvotes: 0

Fabian Hueske
Fabian Hueske

Reputation: 18987

You might want to have a look at Flink's metrics system. It features a Meter metrics type that measures rates, i.e., how often certain events happen over time.

Unfortunately, Meter metrics are not yet included in an Flink release but only available in the current master branch (Flink 1.2 SNAPSHOT). So you would need to build Flink yourself. See the documentation for details.

Upvotes: 0

Related Questions