Reputation: 15599
I want to instantiate an object once to be used by all map operations. The instantiation requires a few set of parameters (~10 or so). I think I should do that with the Mapper.setup
method and use the job configuration to pass the parameters.
I didn't find suitable example. (Note that I am new to hadoop)
Basically, what I am looking for is:
public class MyMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
private static final IntWritable one = new IntWritable(1);
private static MyParser parser;
protected void setup(Context context)
throws IOException, InterruptedException{
String param1 = ""; // how to get those?
String param2 = "";
parser = new MyParser(param1,param2);
}
protected void map(LongWritable offset, Text value, Context context)
throws IOException, InterruptedException {
String key = parser.parse(value.toString());
context.write(new Text(key),one);
}
}
Is it a suitable approach? Is there alternative?
Sub-question: What if the parameters depend on the file that is processed?
Upvotes: 0
Views: 105
Reputation: 97
In the main method add these lines after declaring configuration object and set the parameters
Configuration con = new Configuration();
con.set("param1", "welcome"); // for e.g
con.set("param2", "hello"); // for e.g
Add theses lines in the Mapper setup method . Those parameters can be retrived with the help of configuaration object from the context object
Configuration conf = context.getConfiguration();
String param1 =conf.get("param1"); // welcome will be coming here
String param2 =conf.get("param2"); // hello will be coming here
You can make it as a static parameter and in a file if you want to process use distriubuted cache –
Upvotes: 1