Reputation: 5524
Can someone provide throw some light on practical usecases of GroupCombine of Grouped Dataset in Apache flink.
Upvotes: 0
Views: 166
Reputation: 509
GroupCombine is used for optimization purposes. Unlike GroupReduce, it does not do any data shuffling but only works on individual partitions. This helps in reducing the data to be sent to next reduce operation. In simple words, it is a Local Reduce operation.
If you are familiar with Map Reduce functions in Hadoop, We have combiner operation there as well. This GroupCombine in Flink works exactly in the same way.
Here is a visual representation of Combiner in Hadoop.
Hope this helps !
Upvotes: 1