Reputation: 101
I want to compare two text files line by line to find whether they are equal or not. How can I do it using hadoop map reduce programming?
static int i=0;
public void map(LongWritable key, String value, OutputCollector<String,IntWritable> output, Reporter reporter) throws IOException {
String line = value.toString();
i++; //used as a line number
output.collect(line, new IntWritable(i));
}
I tries to map each line with line number.But how can i reduce it and compare with another file?
Upvotes: 3
Views: 8572
Reputation:
Comparing two text files is equivalent to joining two files in map reduce programming. For Joining two text files you have to use two mappers with same keys. In your case you can use the key as line offset and value as line. MultipleInputs() method is used for using multiple mappers and multiple text files.
Please find below the detailed program for comparing two text files in map-reduce programming using JAVA.
The arguments for the program are file 1,file 2 and output file
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.MultipleInputs;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class CompareTwoFiles {
public static class Map extends
Mapper<LongWritable, Text, LongWritable, Text> {
@Override
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
context.write(key, value);
}
}
public static class Map2 extends
Mapper<LongWritable, Text, LongWritable, Text> {
@Override
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
context.write(key, value);
}
}
public static class Reduce extends
Reducer<LongWritable, Text, LongWritable, Text> {
@Override
public void reduce(LongWritable key, Iterable<Text> values,
Context context) throws IOException, InterruptedException {
String[] lines = new String[2];
int i = 0;
for (Text text : values) {
lines[i] = text.toString();
i++;
}
if (lines[0].equals(lines[1])) {
context.write(key, new Text("same"));
} else {
context.write(key,
new Text(lines[0] + " vs " + lines[1]));
}
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
conf.set("fs.default.name", "hdfs://localhost:8020");
Job job = new Job(conf);
job.setJarByClass(CompareTwoFiles.class);
job.setJobName("Compare Two Files and Identify the Difference");
FileOutputFormat.setOutputPath(job, new Path(args[2]));
job.setReducerClass(Reduce.class);
job.setOutputKeyClass(LongWritable.class);
job.setOutputValueClass(Text.class);
MultipleInputs.addInputPath(job, new Path(args[0]),
TextInputFormat.class, Map.class);
MultipleInputs.addInputPath(job, new Path(args[1]),
TextInputFormat.class, Map2.class);
job.waitForCompletion(true);
}
}
Upvotes: 3