Reputation: 1533
Could I set several mapper class into one job?
For example I have a csv input file from the HDFS. I have two tasks to do. The first one is to count two fields from the csv input file and get the result into a output file. The second one is to count another two fields from the same csv input file and get the result into another output file. The Reducer is the same.
How could I achieve this just using one job and make them process at the same time? (I don't want to do the first one and then do the second after the first one finishing, I want to let them process parallel).
I try the following code:
job1.setMapperClass(Mapper1.class);
job1.setReducerClass(LogReducer.class);
job1.setMapperClass(Mapper2.class);
job1.setReducerClass(LogReducer.class);
I try it but it doesn't work, it only show me the second result, the first one is gone away.
Upvotes: 0
Views: 191
Reputation: 603
Have a look at MultipleOutputs class in Hadoop to write to multiple files from a reducer. Write the output to the second file based on conditions in your reduce method.
Upvotes: 1
Reputation: 16390
So the question is whether or not you want one output or two outputs from the reducer. You could map the two inputs, one mapped by Mapper1 and the other mapped by Mapper2, and then pass the merged intermediate results into a reducer to get one output. That's using the MultipleInputs class in a single job and can be configured in the driver class.
If you want the reduced results of Mapper1 to be separate from the reduced results of Mapper2, then you need to configure two jobs. The two jobs would have different mappers but would configure the same reducer class.
Upvotes: 2
Reputation: 990
It clearly needs two jobs to run in parallel. what is the problem with running two jobs in parallel as the mapping task and output path are different. Job can't handle multiple mappers, if it is not chained.
Upvotes: 2