Reputation: 5557
Friends,
I am new to Map-Reduce and trying my hand with one example which only executes a Mapper; but the output is strange and not expected. Please help me finding, if I am missing something here:
Code part:
Imports:
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
Driver Program
Job job = new Job(conf,"SampleProgram");
job.setJarByClass(SampleMR.class); // class that contains mapper and reducer
job.setMapperClass(MyMapper.class);
job.setReducerClass(MyReducer.class); // reducer class
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
job.setNumReduceTasks(0);
FileInputFormat.setInputPaths(job, new Path("/tmp/"));
FileOutputFormat.setOutputPath(job, new Path("/tmp/out")); // adjust directories as required
job.submit();
boolean b = job.waitForCompletion(true);
if (!b) {
throw new IOException("error with job!");
}
Mapper Program
public static class MyMapper extends Mapper<LongWritable, Text, Text, Text> {
@Override
public void map(LongWritable idx , Text value, Context context) throws IOException, InterruptedException {
String[] tokens = value.toString().split("|");
String keyPrefix = tokens[0] + tokens[1];
context.write(new Text(keyPrefix), value);
}
}
There is a reducer phase as well, but I have set reducer to 0 to debug the issue. Here the mapper is not behaving correctly.
For the Input
379782759851005|ABCDEFG|name:YOLO|top:44.7|avgtop:19.2
The expected Map output is
379782759851005ABCDEFG [Blank Space] 379782759851005|ABCDEFG|name:YOLO|top:44.7|avgtop:19.2
Output my Mapper
3 [Blank Space] 379782759851005|ABCDEFG|name:YOLO|top:44.7|avgtop:19.2
Looks like, the Key is printing just first letter of the expected output. Same is happening with value as well, if I try to add tokens[4]
as value to the context. Looks like there is something happening while spliting the string.
Any Insight, what could be going wrong?
Upvotes: 0
Views: 82
Reputation: 3554
you need to escape the pipe character. see the link below:
Splitting string with pipe character ("|")
Upvotes: 1