del bao
del bao

Reputation: 1164

Flink large size / small advance sliding window performance

My use case

This post laid out a good optimization solution by a tumbling window of 1 day

So my logic would be like

env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)

val oneDayCounts = joins
  .keyBy(keyFunction)
  .map(t => (t.key, 1L, t.timestampMs))
  .keyBy(0)
  .timeWindow(Time.days(1))

val sevenDayCounts = oneDayCounts
  .keyBy(0)
  .timeWindow(Time.days(7), Time.minutes(10))
  .sum(1)

// single reducer
sevenDayCounts
  .windowAll(TumblingProcessingTimeWindows.of(Time.minutes(10)))

P.S. forget about the performance concern of the single reducer.

Question

If I understand correctly, however, this would mean a single event would produce 7*24*6=1008 records due to the nature of the sliding window. So my question is how can I reduce the sheer amount?

Upvotes: 2

Views: 982

Answers (1)

David Anderson
David Anderson

Reputation: 43499

There's a JIRA ticket -- FLINK-11276 -- and a google doc on the topic of doing this more efficiently.

I also recommend you take a look at this paper and talk about Efficient Window Aggregation with Stream Slicing.

Upvotes: 4

Related Questions