Partitioner of Hadoop for first two words of key

Question

When I perform Hadoop streaming. There's the output of mapper (Key, Value) The key is a word sequence that separated with white-space.

I'd like to use partitioner that returns hash value of first two words.

So, implemented as

public static class CounterPartitioner extends Partitioner {
    @Override
    public int getPartition(Text key, IntWritable value, int numPartitions) {
        String[] line = key.toString().split(" ");
        String prefix = (line.length > 1) ? (line[0] + line[1]) : line[0];
        return (prefix.hashCode() & Integer.MAX_VALUE) % numPartitions;
    }
}

My question is is there a way by using built-in Hadoop library and modifying configuration such as

mapred.output.key.comparator.class
stream.map.output.field.separator
stream.num.map.output.key.fields
map.output.key.field.separator
mapred.text.key.comparator.options
...

Thanks in advance.

Praveen Sripati · Accepted Answer

When I perform Hadoop streaming. There's the output of mapper (Key, Value) The key is a word sequence that separated with white-space.

My question is is there a way by using built-in Hadoop library and modifying configuration such as

mapred.output.key.comparator.class stream.map.output.field.separator

Built-in Hadoop library is based on Java and the purpose of streaming is to use other languages besides Java which talks to STDIO/STDOUT.

I don't see the purpose of changing the streaming related properties using Hadoop API which is built using Java.

BYW, Configuration#set can be used to set the configuration properties besides setting them in the configuration files and from the command prompt.

Partitioner of Hadoop for first two words of key

Answers (1)

Related Questions