PMR
PMR

Reputation: 101

comparing two text files using hadoop map reduce

I want to compare two text files line by line to find whether they are equal or not. How can I do it using hadoop map reduce programming?

static int i=0;
public void map(LongWritable key, String value, OutputCollector<String,IntWritable> output, Reporter reporter) throws IOException {
      String line = value.toString();
     i++; //used as a line number
        output.collect(line, new IntWritable(i));
 }

I tries to map each line with line number.But how can i reduce it and compare with another file?

Upvotes: 3

Views: 8572

Answers (1)

user1112259
user1112259

Reputation:

Comparing two text files is equivalent to joining two files in map reduce programming. For Joining two text files you have to use two mappers with same keys. In your case you can use the key as line offset and value as line. MultipleInputs() method is used for using multiple mappers and multiple text files.

Please find below the detailed program for comparing two text files in map-reduce programming using JAVA.

The arguments for the program are file 1,file 2 and output file

import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.MultipleInputs;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class CompareTwoFiles {

    public static class Map extends
            Mapper<LongWritable, Text, LongWritable, Text> {

        @Override
        public void map(LongWritable key, Text value, Context context)
                throws IOException, InterruptedException {
            context.write(key, value);
        }
    }

    public static class Map2 extends
            Mapper<LongWritable, Text, LongWritable, Text> {

        @Override
        public void map(LongWritable key, Text value, Context context)
                throws IOException, InterruptedException {
            context.write(key, value);
        }
    }

    public static class Reduce extends
            Reducer<LongWritable, Text, LongWritable, Text> {

        @Override
        public void reduce(LongWritable key, Iterable<Text> values,
                Context context) throws IOException, InterruptedException {
            String[] lines = new String[2];
            int i = 0;
            for (Text text : values) {
                lines[i] = text.toString();
                i++;
            }
            if (lines[0].equals(lines[1])) {
                context.write(key, new Text("same"));
            } else {
                context.write(key,
                        new Text(lines[0] + "     vs    " + lines[1]));
            }

        }

    }

    public static void main(String[] args) throws Exception {

        Configuration conf = new Configuration();
        conf.set("fs.default.name", "hdfs://localhost:8020");
        Job job = new Job(conf);
        job.setJarByClass(CompareTwoFiles.class);
        job.setJobName("Compare Two Files and Identify the Difference");
        FileOutputFormat.setOutputPath(job, new Path(args[2]));
        job.setReducerClass(Reduce.class);
        job.setOutputKeyClass(LongWritable.class);
        job.setOutputValueClass(Text.class);
        MultipleInputs.addInputPath(job, new Path(args[0]),
                TextInputFormat.class, Map.class);
        MultipleInputs.addInputPath(job, new Path(args[1]),
                TextInputFormat.class, Map2.class);
        job.waitForCompletion(true);

    }

}

Upvotes: 3

Related Questions