Reputation: 11
Please let me know the Difference between partial sort, total sort and secondary sort in hadoop
Upvotes: 1
Views: 2283
Reputation: 38910
Partial Sort:
N number of Mappers will simply generate N number of files. N number of reducers will sort these files individually.
Total Sort
All key value pairs from a particular Key will reach a particular reducer. This will happen through Partitioners at Mapper level. Combiners at Mapper level will act as Semi reducers and send values of a particular key to Reducer.
The reducer output will be a single file having all the output sorted based on the key.
Secondary Sort
Used to define how map output keys are sorted. It works at Mapper level. In this case, we will be able to control the ordering of the values along with the keys.That is sorting can be done on two or more field values.
Have a look at article1 and article2 and article3
Upvotes: 0
Reputation: 1170
Partial Sort:-
The reducer output will be lot of files each of which is sorted within itself based on the key.
Total Sort:
The reducer output will be a single file having all the output sorted based on the key.
Secondary Sort:
In this case, we will be able to control the ordering of the values along with the keys.That is sorting can be done on two or more field values.
Upvotes: 3