USB
USB

Reputation: 6139

different keys goes into 1 file even if using Hadoop custom Partitioner

I am running out a minute issue.

I am trying to get different file for different keys from Reducer.

Partitioner

public class customPartitioner extends Partitioner<Text, NullWritable> implements
Configurable {
private Configuration configuration;

@Override
public Configuration getConf() {
    return configuration;
}

public int getPartition(Text key, NullWritable value, int numPartitions) {
    return Math.abs(key.hashCode()) % numPartitions;
}
}

And I set the following in my driver class

job0.setPartitionerClass(customPartitioner.class);
job0.setNumReduceTasks(5);

For reducer I have 5 keys

[3, 0, 5, 8, 12]

So I need to get 5 different files.

But once I run this code I am getting 5 part files but the results are not expected.

OUTPUT

Found 6 items
-rw-r--r--   3 sreeveni root          0 2015-12-09 11:44 /OUT/Part/OUT/_SUCCESS
-rw-r--r--   3 sreeveni root          0 2015-12-09 11:44 /OUT/Part/OUT/part-r-00000
-rw-r--r--   3 sreeveni root          4 2015-12-09 11:44 /OUT/Part/OUT/part-r-00001
-rw-r--r--   3 sreeveni root          0 2015-12-09 11:44 /OUT/Part/OUT/part-r-00002
-rw-r--r--   3 sreeveni root          4 2015-12-09 11:44 /OUT/Part/OUT/part-r-00003
-rw-r--r--   3 sreeveni root          3 2015-12-09 11:44 /OUT/Part/OUT/part-r-00004

In that 2 files are empty and the other contains

sreeveni@machine10:~$ hadoop fs -cat /OUT/Part/OUT/part-r-00001
3
8
sreeveni@machine10:~$ hadoop fs -cat /OUT/Part/OUT/part-r-00003
0
5
sreeveni@machine10:~$ hadoop fs -cat /OUT/Part/OUT/part-r-00004
12

Why 2 keys come under one file?

Am I doing any mistake in my code? Please help

Upvotes: 1

Views: 59

Answers (1)

Ben Watson
Ben Watson

Reputation: 5521

Your partitioner is doing the right thing, so I'll try and explain why. Let's pass each of your inputs into your partition code and see what comes out. numPartitions is 5 as it's the number of reducers you set.

int hash = new Text("3").hashCode(); // = 82
hash % numPartitions; // = 2

hash = new Text("0").hashCode(); // = 79
hash % numPartitions; // = 4

hash = new Text("5").hashCode(); // = 84
hash % numPartitions; // = 4

hash = new Text("8").hashCode(); // = 87
hash % numPartitions; // = 2

hash = new Text("12").hashCode(); // = 2530
hash % numPartitions; // = 0

As we can see, we get the same results running it manually. Two keys come under one file because the partitioner assigns them to the same reducer. Partitioning will even out over the course of a larger data set, but you can't expect that code to automatically and evenly distribute all inputs.

Upvotes: 3

Related Questions