Bingo
Bingo

Reputation: 56

If 2 Mappers output the same key , what will the input to the reducer be?

I've the following doubt while learning Map reduce. It will be of great help if some one could answer.

I've two mappers working on the same file - I configured them using MultipleInputFormat

mapper 1 - Expected Output [ after extracting few columns of a file]

a - 1234
b - 3456
c - 1345

Mapper 2 Expected output [After extracting few columns of the same file]

a - Monday
b - Tuesday
c - Wednesday

And there is a reducer function that just outputs the key and value pair that it gets as input So I expected the output to be as I know that similar keys will be shuffled to make a list.

a - [1234,Monday]
b - [3456, Tuesday]
c - [1345, Wednesday]

But am getting some weird output.I guess only 1 Mapper is getting run. Should this not be expected ? Will the output of each mapper be shuffled separately ? Will both the mappers run parallel ?

Excuse me if its a lame question Please understand that I am new to Hadoop and Map Reduce

Below is the code

//Mapper1
public class numbermapper extends Mapper<Object, Text, Text, Text>{

    public void map(Object key,Text value, Context context) throws IOException, InterruptedException {
        String record = value.toString();
        String[] parts = record.split(",");
        System.out.println("***Mapper number output "+parts[0]+"  "+parts[1]);
        context.write(new Text(parts[0]), new Text(parts[1]));

    }
}

//Mapper2
public class weekmapper extends Mapper<Object, Text, Text, Text> {
    public void map(Object key, Text value, Context context)
            throws IOException, InterruptedException {
        String record = value.toString();
        String[] parts = record.split(",");
        System.out.println("***Mapper week output "+parts[0]+"   "+parts[2]);
        context.write(new Text(parts[0]), new Text(parts[2]));
    }
}

//Reducer
public class rjoinreducer extends Reducer<Text, Text, Text, Text> {
public void reduce(Text key, Text values, Context context)
    throws IOException, InterruptedException {
   context.write(key, values);

}
}

//Driver class
public class driver {

    public static void main(String[] args) throws Exception {
        Configuration conf = new Configuration();
        Job job = new Job(conf, "Reduce-side join");
        job.setJarByClass(numbermapper.class);
        job.setReducerClass(rjoinreducer.class);
        job.setMapOutputValueClass(Text.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(Text.class);


        MultipleInputs.addInputPath(job, new Path(args[0]),TextInputFormat.class, numbermapper.class);
        MultipleInputs.addInputPath(job, new Path(args[0]),TextInputFormat.class, weekmapper.class);
        Path outputPath = new Path(args[1]);


        FileOutputFormat.setOutputPath(job, outputPath);
        outputPath.getFileSystem(conf).delete(outputPath);
        System.exit(job.waitForCompletion(true) ? 0 : 1);
    }
}

And this is the O/P I got-

a     Monday
b     Tuesday
c     Wednesday

Dataset used

a,1234,Monday
b,3456,Tuesday
c,1345,Wednesday

Upvotes: 0

Views: 1440

Answers (1)

Bingo
Bingo

Reputation: 56

Multiple input format was just taking 1 file and running one mapper on it because I have given the same path for both the Mappers.

When I copy the dataset to a different file and ran the same program taking two different files (same content but different names for the files) I got the expected output.

So i now understood that the output from different mapper functions is also combined based on key , not just the output from the same mapper function.

Thanks for trying to help....!!!

Upvotes: 3

Related Questions