Reputation: 6485
Is it allowed to pass an object (like a tree) as the output value of a mapper in Hadoop? Is so, how?
Upvotes: 0
Views: 2523
Reputation: 30089
To expand on Tariq's links, and to simply detail one possible implementation for a <Text, IntWritable>
treemap:
public class TreeMapWritable extends TreeMap<Text, IntWritable>
implements Writable {
@Override
public void write(DataOutput out) throws IOException {
// write out the number of entries
out.writeInt(size());
// output each entry pair
for (Map.Entry<Text, IntWritable> entry : entrySet()) {
entry.getKey().write(out);
entry.getValue().write(out);
}
}
@Override
public void readFields(DataInput in) throws IOException {
// clear current contents - hadoop re-uses objects
// between calls to your map / reduce methods
clear();
// read how many items to expect
int count = in.readInt();
// deserialize a key and value pair, insert into map
while (count-- > 0) {
Text key = new Text();
key.readFields(in);
IntWritable value = new IntWritable();
value.readFields(in);
put(key, value);
}
}
}
Basically the default serialization factory in Hadoop expects objects output to implement the Writable interface (the readFields and write methods detailed above). In this way you can pretty much extend any class to retro-fit the serialization methods.
Another option is to enable Java Serialization (which uses default java serialization methods) org.apache.hadoop.io.serializer.JavaSerialization
by configuring the io.serializations
configuration property, but i wouldn't recommend that.
Upvotes: 3