HHH
HHH

Reputation: 6485

How to pass an object as value in Hadoop

Is it allowed to pass an object (like a tree) as the output value of a mapper in Hadoop? Is so, how?

Upvotes: 0

Views: 2523

Answers (1)

Chris White
Chris White

Reputation: 30089

To expand on Tariq's links, and to simply detail one possible implementation for a <Text, IntWritable> treemap:

public class TreeMapWritable extends TreeMap<Text, IntWritable> 
                             implements Writable {

    @Override
    public void write(DataOutput out) throws IOException {
        // write out the number of entries
        out.writeInt(size());
        // output each entry pair
        for (Map.Entry<Text, IntWritable> entry : entrySet()) {
            entry.getKey().write(out);
            entry.getValue().write(out);
        }
    }

    @Override
    public void readFields(DataInput in) throws IOException {
        // clear current contents - hadoop re-uses objects
        // between calls to your map / reduce methods
        clear();

        // read how many items to expect
        int count = in.readInt();
        // deserialize a key and value pair, insert into map
        while (count-- > 0) {
            Text key = new Text();
            key.readFields(in);

            IntWritable value = new IntWritable();
            value.readFields(in);

            put(key, value);
        }
    }
}

Basically the default serialization factory in Hadoop expects objects output to implement the Writable interface (the readFields and write methods detailed above). In this way you can pretty much extend any class to retro-fit the serialization methods.

Another option is to enable Java Serialization (which uses default java serialization methods) org.apache.hadoop.io.serializer.JavaSerialization by configuring the io.serializations configuration property, but i wouldn't recommend that.

Upvotes: 3

Related Questions