Reputation: 6139
I am running out a minute issue.
I am trying to get different file for different keys from Reducer.
Partitioner
public class customPartitioner extends Partitioner<Text, NullWritable> implements
Configurable {
private Configuration configuration;
@Override
public Configuration getConf() {
return configuration;
}
public int getPartition(Text key, NullWritable value, int numPartitions) {
return Math.abs(key.hashCode()) % numPartitions;
}
}
And I set the following in my driver class
job0.setPartitionerClass(customPartitioner.class);
job0.setNumReduceTasks(5);
For reducer I have 5 keys
[3, 0, 5, 8, 12]
So I need to get 5 different files.
But once I run this code I am getting 5 part files but the results are not expected.
OUTPUT
Found 6 items
-rw-r--r-- 3 sreeveni root 0 2015-12-09 11:44 /OUT/Part/OUT/_SUCCESS
-rw-r--r-- 3 sreeveni root 0 2015-12-09 11:44 /OUT/Part/OUT/part-r-00000
-rw-r--r-- 3 sreeveni root 4 2015-12-09 11:44 /OUT/Part/OUT/part-r-00001
-rw-r--r-- 3 sreeveni root 0 2015-12-09 11:44 /OUT/Part/OUT/part-r-00002
-rw-r--r-- 3 sreeveni root 4 2015-12-09 11:44 /OUT/Part/OUT/part-r-00003
-rw-r--r-- 3 sreeveni root 3 2015-12-09 11:44 /OUT/Part/OUT/part-r-00004
In that 2 files are empty and the other contains
sreeveni@machine10:~$ hadoop fs -cat /OUT/Part/OUT/part-r-00001
3
8
sreeveni@machine10:~$ hadoop fs -cat /OUT/Part/OUT/part-r-00003
0
5
sreeveni@machine10:~$ hadoop fs -cat /OUT/Part/OUT/part-r-00004
12
Why 2 keys come under one file?
Am I doing any mistake in my code? Please help
Upvotes: 1
Views: 59
Reputation: 5521
Your partitioner is doing the right thing, so I'll try and explain why. Let's pass each of your inputs into your partition code and see what comes out. numPartitions
is 5
as it's the number of reducers you set.
int hash = new Text("3").hashCode(); // = 82
hash % numPartitions; // = 2
hash = new Text("0").hashCode(); // = 79
hash % numPartitions; // = 4
hash = new Text("5").hashCode(); // = 84
hash % numPartitions; // = 4
hash = new Text("8").hashCode(); // = 87
hash % numPartitions; // = 2
hash = new Text("12").hashCode(); // = 2530
hash % numPartitions; // = 0
As we can see, we get the same results running it manually. Two keys come under one file because the partitioner assigns them to the same reducer. Partitioning will even out over the course of a larger data set, but you can't expect that code to automatically and evenly distribute all inputs.
Upvotes: 3