Reputation: 637
I'm having a bit difficult in understanding in Hadoop, how the data put into the map and reduced functions. I know that we can define the input format and output format and then the key types for input and output. But for an example if we want an object to be the input type, how does Hadoop internally does that ?
Thanx...
Upvotes: 3
Views: 2321
Reputation: 34184
you can use Hadoop InputFormat and OutputFormat interfaces to create your custom formats..an example could be to format the output of your MapReduce job as JSON..something like this -
public class JsonOutputFormat extends TextOutputFormat<Text, IntWritable> {
@Override
public RecordWriter<Text, IntWritable> getRecordWriter(
TaskAttemptContext context) throws IOException,
InterruptedException {
Configuration conf = context.getConfiguration();
Path path = getOutputPath(context);
FileSystem fs = path.getFileSystem(conf);
FSDataOutputStream out =
fs.create(new Path(path,context.getJobName()));
return new JsonRecordWriter(out);
}
private static class JsonRecordWriter extends
LineRecordWriter<Text,IntWritable>{
boolean firstRecord = true;
@Override
public synchronized void close(TaskAttemptContext context)
throws IOException {
out.writeChar('{');
super.close(null);
}
@Override
public synchronized void write(Text key, IntWritable value)
throws IOException {
if (!firstRecord){
out.writeChars(",\r\n");
firstRecord = false;
}
out.writeChars("\"" + key.toString() + "\":\""+
value.toString()+"\"");
}
public JsonRecordWriter(DataOutputStream out)
throws IOException{
super(out);
out.writeChar('}');
}
}
}
Upvotes: 7