Reputation: 979
I am new to hadoop, little confuse about the hadoop.
In mapreduce job the reducer get a list of values for each keys. I want to know, what is the default ordering of values for each keys. Is the the same order as it has been written out from the mapper. Can you change the ordering ( eg asc or desc ) of the values in each key.
Upvotes: 1
Views: 795
Reputation: 3760
In MapReduce, there are a few properties that affect the emission of map output. This is referred to as the secondary sort. Namely, two factors affect this:
The default partitioner is the org.apache.hadoop.mapred.lib.HashPartitioner
class, which hashes a record’s key to determine which partition the record belongs in.
Comparators differ by data type. If you want to control the sort order, override compare(WritableComparable,WritableComparable)
of the WritableComparator()
interface. See documentation here.
Upvotes: 1
Reputation: 66
Is the the same order as it has been written out from the mapper.
- Yes
It is true for single mapper. But, if your job has more than one mapper, you may not see the same order for two runs with same input as different mappers may end different times.
Can you change the ordering ( eg asc or desc ) of the values in each key
- Yes
It is done using a technique called 'secondary sort'(you may Google for more reading on this).
Upvotes: 1