Reputation: 45
Such as in hadoop , there is a shuffle phase between map and reduce . And I want to know if there is such a stage in flink, and how it works .Because I have read a lot of websites, they did not mention much about that.Such as a wordcount demo , it has a flatmap,key and sum.Are there always a shuffle phase between two operators ?And can I get the Intermediate data between these operators?
Upvotes: 1
Views: 574
Reputation: 1881
Shuffle is not always performed and it depends on only specific operators. In case of your example, the keyby step in the wordCount example introduces a hash partitioner which performs shuffling of the data based on the key.
In other cases for example - if you want to just process and filter your data without some form of aggregation and then write somewhere, then each of your partitions would hold its own data and there wouldn't be any kind of shuffling involved.
So to answer your questions -
Upvotes: 2