Mapper with multipleInput on Hadoop cluster

Question

I have to implement two mapReduce jobs where a Mapper in phase II (Mapper_2) needs to have an output of the Reducer in phase I (reducer_1).

Mapper_2 also needs another input that is a big text file (2TB).

I have written as follows but my question is: text input will be split amongst nodes in the cluster, but what about output of reducer _1 as I want each mapper in phase II to have the whole of Reducer_1's output.

MultipleInputs.addInputPath(Job, TextInputPath, SomeInputFormat.class, Mapper_2.class);
MultipleInputs.addInputPath(Job, Ruducer_1OutputPath, SomeInputFormat.class, Mapper_2.class);

Mapper with multipleInput on Hadoop cluster

Answers (1)

Related Questions