Chandan Kumar Bala
Chandan Kumar Bala

Reputation: 27

Does the default hash partitioner still work if a custom partitioner is defined in Hadoop Map Reduce?

As I am new to hadoop,I tried out the sample code from http://www.tutorialspoint.com/map_reduce/map_reduce_partitioner.htm I found that the program uses 3 different partitions based on age group and 3 reducers are also used , which is expected. But in the reducer code (here the gender is key Male/Female) I still get ,, I assumed that this list of values creation is done by the hash partitioner. But as I have defined the getPartitions() , who does this list creation?

Upvotes: 1

Views: 444

Answers (2)

Nicomak
Nicomak

Reputation: 2339

Simple explanation of the getPartition() method

If your job has 3 reducers, they are indexed by integers as well: 0, 1 and 2.

The purpose of the getPartition() method is to take as parameter each (key, value) pairs of your map output, and decide if that pair should go to reducers 0, 1 or 2. That's why the getPartition() method's return type is int.

So all map outputs which are (after being analysed by getPartition()) affected to Reducer 2, will be written to a same partition, also indexed as 2. That partition will be sitting inside the mapper, waiting for reducer 2 to fetch it.

Who creates this partition, you ask ? It's a class called MapFileOutputFormat, in a method called getEntry() according to my findings. As the class name suggests, probably a class responsible for managing the map output data.

The HashPartitioner is the default partitioner which is only used when you don't define any partitioner for the job. It is based on a hashcode of the (key,value) pair's key only, so that all pairs with the same key (i.e. the same hashcode) end up in the same partitioner, which is the default behaviour in MapReduce.

In your tutorial

The code you refered to in your tutorial uses a custom partitioner, whose implementation of the getPartition() method associates age groups to certain partitioners. (under 20 years old go to reducer 0, between 20 and 30 go to reducer 1 etc...).

This custom partitioner (CaderPatitioner) is the Partitioner of the MapReduce job because it was set using job.setPartitionerClass(). There is only one partitioner in a job, so the HashPartitioner is never used in this job, so it does absolutely nothing in your case.

So to answer your question, if I understood it well, CaderPatitioner is responsible of deciding how to separate map outputs into partitions, who will then end up in separate reducers.

Upvotes: 0

rraghuva
rraghuva

Reputation: 131

In the above example code we have below driver code in run method--

  Configuration conf = getConf();

  Job job = new Job(conf, "topsal");
  job.setJarByClass(PartitionerExample.class);

  FileInputFormat.setInputPaths(job, new Path(arg[0]));
  FileOutputFormat.setOutputPath(job,new Path(arg[1]));

  job.setMapperClass(MapClass.class);

  job.setMapOutputKeyClass(Text.class);
  job.setMapOutputValueClass(Text.class);

  //set partitioner statement

  job.setPartitionerClass(CaderPartitioner.class);
  job.setReducerClass(ReduceClass.class);
  job.setNumReduceTasks(3);
  job.setInputFormatClass(TextInputFormat.class);

  job.setOutputFormatClass(TextOutputFormat.class);
  job.setOutputKeyClass(Text.class);
  job.setOutputValueClass(Text.class);

  System.exit(job.waitForCompletion(true)? 0 : 1);
  return 0;

Here you can see that its setting CaderPartitioner class as a partitioner for above MR. And as per Map Reduce specification if our program is not setting any custom partitioner then only in such cases identity partitioner will come into picture.

So in the above scenario CaderPartitioner will take place and do the partition for the above MR. Since it have 3 conditions it will divide input keys in 3 different groups and send these individual groups to different reducer and reducer will take place accordingly.

Hope this helps.

Upvotes: 0

Related Questions