Reputation: 389
Assuming there is a file and two different independent mappers to be executed upon that file in parallel. To do that we require to use a copy of the file.
What I want to know is "Is it possible to use same file for the two mappers" which in turn will reduce the resources utilization and make the system time efficient.
Is there any research in this area or any existing tool in Hadoop which can help in overcoming this.
Upvotes: 1
Views: 855
Reputation: 117
On a high level, there are 2 scenarios I could imagine with the question in hand.
Case 1:
If you are trying to write the SAME implementation in both Mapper classes to process the same input file with the sole aim of efficient resource utilization, this probably isn't the correct approach. Because, when a file is saved in the cluster it gets divided into blocks and replicated across data nodes. This basically gives you the most efficient resource utilization as all the data blocks for the same input file are processed in PARALLEL.
Case 2:
If you are trying to write two DIFFERENT Mapper implementations (with their own business logic), for some particular workflow you want to execute based on your business requirements. Yes, you can pass the same input file to two different mappers using MultipleInputs class.
MultipleInputs.addInputPath(job, file1, TextInputFormat.class, Mapper1.class);
MultipleInputs.addInputPath(job, file1, TextInputFormat.class, Mapper2.class);
This could only be a workaround based on what you want to implement.
Thanks.
Upvotes: 0
Reputation: 30089
Assuming that both Mappers have the same K,V
signature, you could use a delegating mapper and then call the map method of your two mappers:
public class DelegatingMapper extends Mapper<LongWritable, Text, Text, Text> {
public Mapper<LongWritable, Text, Text, Text> mapper1;
public Mapper<LongWritable, Text, Text, Text> mapper2;
protected void setup(Context context) {
mapper1 = new MyMapper1<LongWritable, Text, Text, Text>();
mapper1.setup(context);
mapper2 = new MyMapper1<LongWritable, Text, Text, Text>();
mapper2.setup(context);
}
public void map(LongWritable key, Text value, Context context) {
// your map methods will need to be public for each class
mapper1.map(key, value, context);
mapper2.map(key, value, context);
}
protected void cleanup(Context context) {
mapper1.cleanup(context);
mapper2.cleanup(context);
}
}
Upvotes: 3