Looking for an alternate form of windowAll() that keeps data on the same node for aggregations

Question

I have a highly parallelized aggregation with a lot of keys I am running across multiple nodes. I am then wanting to do a summary aggregation across all values similar to the code below:

val myStream = sourceStream
  .keyBy( 0 )    
  .window(TumblingProcessingTimeWindows.of(Time.minutes(5)))          
  .reduce(_ + _)
  .addSink(new OtherSink)

val summaryStream = myStream
  .map(Row.fromOtherRow(_))
                     // parallelism is 1 by definition
  .windowAll(TumblingProcessingTimeWindows.of(Time.minutes(5)))   
  .reduce(_ + _)
  .addSink(new RowSink)

This works fine, but I notice the node that ends up doing the windowAll() gets a tremendous amount of inbound network traffic as well as a significant spike on that node's CPU. This is obviously because all of the data is being aggregated together and the parallelism is '1'.

Are there any current or planned provisions in Flink to do more of a two tier summary aggregation that would keep all of the data on each node, pre-aggregate it before send on the results to a second tier for the final aggregation? Here is some psuedo code to what I would have hoped to find:

val myStream = sourceStream
  .keyBy( 0 )    
  .window(TumblingProcessingTimeWindows.of(Time.minutes(5)))          
  .reduce(_ + _)
  .addSink(new OtherSink)

val summaryStream = myStream
  .map(Row.fromOtherRow(_))
                     // parallelism would be at the default for the env
  .windowLocal(TumblingProcessingTimeWindows.of(Time.minutes(5)))
  .reduce(_ + _)
                     // parallelism is 1 by definition
  .windowAll(TumblingProcessingTimeWindows.of(Time.minutes(5)))
  .reduce(_ + _)
  .addSink(new RowSink)

I called it 'windowLocal()', but I am sure there could be a better name. It would be non-keyed just like windowAll(). The key benefits is it would reduce the network and CPU and Memory hit windowAll() has, by distributing this across all of the nodes you are running. I currently have to allocate more resources to my nodes to accommodate this summarization.

If this can be accomplished in some other way with the current version I would love to hear about it. I already thought about using a random value for a key for the second tier, but I beleive that would result in a full rebalance of the data, so it solves my CPU and Memory issue, but not the network. I am looking for something in the same vein as rescale() where the data stays local to the task manager or the slot.

Looking for an alternate form of windowAll() that keeps data on the same node for aggregations

Answers (1)

Related Questions