emesday
emesday

Reputation: 6186

Partitioner of Hadoop for first two words of key

When I perform Hadoop streaming. There's the output of mapper (Key, Value) The key is a word sequence that separated with white-space.

I'd like to use partitioner that returns hash value of first two words.

So, implemented as

public static class CounterPartitioner extends Partitioner<Text, IntWritable> {
    @Override
    public int getPartition(Text key, IntWritable value, int numPartitions) {
        String[] line = key.toString().split(" ");
        String prefix = (line.length > 1) ? (line[0] + line[1]) : line[0];
        return (prefix.hashCode() & Integer.MAX_VALUE) % numPartitions;
    }
}

My question is is there a way by using built-in Hadoop library and modifying configuration such as

mapred.output.key.comparator.class
stream.map.output.field.separator
stream.num.map.output.key.fields
map.output.key.field.separator
mapred.text.key.comparator.options
...

Thanks in advance.

Upvotes: 1

Views: 1460

Answers (1)

Praveen Sripati
Praveen Sripati

Reputation: 33555

When I perform Hadoop streaming. There's the output of mapper (Key, Value) The key is a word sequence that separated with white-space.

My question is is there a way by using built-in Hadoop library and modifying configuration such as

mapred.output.key.comparator.class stream.map.output.field.separator

Built-in Hadoop library is based on Java and the purpose of streaming is to use other languages besides Java which talks to STDIO/STDOUT.

I don't see the purpose of changing the streaming related properties using Hadoop API which is built using Java.

BYW, Configuration#set can be used to set the configuration properties besides setting them in the configuration files and from the command prompt.

Upvotes: 2

Related Questions