Reputation: 6139
I am trying to do a chaning Job.
So to some point I want to access the args (public static void main(String[] args)
).
say args[0] in mapper.
Is there a way to access those values in mapper rather than sending them to function and accessing? Alternative Solution
conf.set("args", args[1]);
job1.setJarByClass(BinningDriver.class);
FileSystem fs1 = FileSystem.get(conf);
job1.setOutputKeyClass(Text.class);
job1.setOutputValueClass(Text.class);
job1.setMapperClass(BinningInput.class);
job1.setInputFormatClass(TextInputFormat.class);
job1.setOutputFormatClass(TextOutputFormat.class);
Path out = new Path(args[1]+"/Indexing"); //Output goes to user output location/indexing
if(fs1.exists(out)){
fs1.delete(out,true);
}
FileInputFormat.addInputPath(job1, new Path(args[0]));
FileOutputFormat.setOutputPath(job1, out);
}
Mapper
public void setup(Context context){
Configuration conf = context.getConfiguration();
String param = conf.get("args");
System.out.println("args:"+param);
}
This Works
Upvotes: 0
Views: 540
Reputation: 7462
Args[] is the input parameter of the main function of the Driver class. The only way to access this parameter is from within the Driver (the scope of this parameter is only the main function). So, if you want to pass these to the mapper, you will need to pass them as parameters (e.g. add this information to the Distributed Cache and get it from the configuration of the mappers).
If you simply want to pass some parameters, check this article, and replace "123" with args[2], or whatever arg you are interested in.
If you want to pass a whole file for processing, do the following:
Example:
main method in the Driver class:
public static void main(String[] args) {
...
FileInputFormat.setInputPaths(conf, new Path(args[0]));
FileOutputFormat.setOutputPath(conf, new Path(args[1]));
...
try {
DistributedCache.addCacheFile(new URI(args[2]), conf);
} catch (URISyntaxException e) {
System.err.println(e.toString());
}
....
}
In the Mapper, before the map() method, define the configure method (I am using hadoop 1.2.0):
Set<String> lines;
public void configure(JobConf job){
lines = new HashSet<>();
BufferedReader SW;
try {
localFiles = DistributedCache.getLocalCacheFiles(job);
SW = new BufferedReader(new FileReader(localFiles[0].toString()));
lines.add(SW.readLine());
SW.close();
} catch (FileNotFoundException e) {
System.err.println(e.toString());
} catch (IOException e) {
System.err.println(e.toString());
}
}
For more information on how to use the Distributed Cache, see the API: http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/filecache/DistributedCache.html
Upvotes: 1