ParagFlume
ParagFlume

Reputation: 979

ordering of list of values for each keys of reducer output

I am new to hadoop, little confuse about the hadoop.

In mapreduce job the reducer get a list of values for each keys. I want to know, what is the default ordering of values for each keys. Is the the same order as it has been written out from the mapper. Can you change the ordering ( eg asc or desc ) of the values in each key.

Upvotes: 1

Views: 795

Answers (2)

Myles Baker
Myles Baker

Reputation: 3760

In MapReduce, there are a few properties that affect the emission of map output. This is referred to as the secondary sort. Namely, two factors affect this:

  1. Partitioner, which divides the map output among the reducers. Each partition is processed by a reduce task, so the number of partitions is equal to the number of reduce tasks for the job.
  2. Comparator, which compares values with the same key.

The default partitioner is the org.apache.hadoop.mapred.lib.HashPartitioner class, which hashes a record’s key to determine which partition the record belongs in.

Comparators differ by data type. If you want to control the sort order, override compare(WritableComparable,WritableComparable) of the WritableComparator() interface. See documentation here.

Photo Credit: Tom White Hadoop: The Definitive Guide Ed. 3

Upvotes: 1

PonMaran
PonMaran

Reputation: 66

Is the the same order as it has been written out from the mapper. - Yes It is true for single mapper. But, if your job has more than one mapper, you may not see the same order for two runs with same input as different mappers may end different times.

Can you change the ordering ( eg asc or desc ) of the values in each key - Yes It is done using a technique called 'secondary sort'(you may Google for more reading on this).

Upvotes: 1

Related Questions