Saeed Nasehi
Saeed Nasehi

Reputation: 1000

How to define an array in hadoop partitioner

I am new in hadoop and mapreduce programming and don't know what should i do. I want to define an array of int in hadoop partitioner. i want to feel in this array in main function and use its content in partitioner. I have tried to use IntWritable and array of it but none of them didn't work . I tried to use IntArrayWritable but again it didn't work. I will be pleased if some one help me. Thank you so much

public static IntWritable h = new IntWritable[1];

public static void main(String[] args) throws Exception {
    h[0] = new IntWritable(1);
}

public static class CaderPartitioner extends Partitioner <Text,IntWritable> {

    @Override
    public int getPartition(Text key, IntWritable value, int numReduceTasks) {
        return h[0].get();
    }
}

Upvotes: 2

Views: 513

Answers (2)

Meeran0823
Meeran0823

Reputation: 94

if you have limited number of values, you can do in the below way. set the values on the configuration object like below in main method.

    Configuration conf = new Configuration();
    conf.setInt("key1", value1);
    conf.setInt("key2", value2);

Then implement the Configurable interface for your Partitioner class and get the configuration object, then key/values from it inside your Partitioner

 public class testPartitioner extends Partitioner<Text, IntWritable> implements Configurable{

Configuration config = null;

@Override
public int getPartition(Text arg0, IntWritable arg1, int arg2) {

    //get your values based on the keys in the partitioner
    int value = getConf().getInt("key");
    //do stuff on value

    return 0;
}

@Override
public Configuration getConf() {
    // TODO Auto-generated method stub
    return this.config;
}

@Override
public void setConf(Configuration configuration) {
    this.config = configuration;

 }  
}

supporting link https://cornercases.wordpress.com/2011/05/06/an-example-configurable-partitioner/

note if you have huge number of values in a file then better to find a way to get cache files from job object in Partitioner

Upvotes: 1

Binary Nerd
Binary Nerd

Reputation: 13927

Here's a refactored version of the partitioner. The main changes are:

  1. Removed the main() which isnt needed, initialization should be done in the constructor
  2. Removed static from the class and member variables

public class CaderPartitioner extends Partitioner<Text,IntWritable> {

    private IntWritable[] h;

    public CaderPartitioner() {
        h = new IntWritable[1];
        h[0] = new IntWritable(1);
    }

    @Override
    public int getPartition(Text key, IntWritable value, int numReduceTasks) {
        return h[0].get();
    }
}

Notes:

  • h doesn't need to be a Writable, unless you have additional logic not included in the question.
  • It isn't clear what the h[] is for, are you going to configure it? In which case the partitioner will probably need to implement Configurable so you can use a Configurable object to set the array up in some way.

Upvotes: 1

Related Questions