Reputation: 2128
I'm running the classic hadoop word count program and couldn't really figure out how GenericOptionsParser works in the following scenario.
String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
Command to run the word count program:
hadoop jar /home/hduser/WordCount/wordcount.jar WordCount input output
From the above command, GenericOptionsParser picks up input as otherArgs[0] and output as otherArgs[1]. Why doesn't it pick up WordCount as an argument? How does it exactly work??
I've looked at the GenericOptionsParser source code from hadoop utils but couldn't make much sense of it. Any guidance would be really helpful...
Upvotes: 4
Views: 8017
Reputation: 1253
Command usage is already explained.
The functionality of GenericOptionsParser is to segregate the generic options from user command line args like input, output, other options. Hadoop offers the following generic options.
-D key=value -fs -jt -libjars -files etc....
This class is not only segregates generic options from user command line arguments but also add all these generic options to Hadoop configuration object which is created in the driver method of MR program.
We can use Tool and ToolRunner instead of GenericOptionsParser.
Upvotes: 2
Reputation: 8522
If the jar you are using here(wordcount.jar) is hadoop-examples*.jar, then it is a runnable jar having main class org.apache.hadoop.examples.ExampleDriver
First argument is filtered out, if the example name (wordcount,teragen,terasort) which we specify is a valid option( teragen,terasort,wordcount etc.).
See the following method
org.apache.hadoop.util.ProgramDriver#driver(String[] args)
After the initial filtering example class org.apache.hadoop.examples.WordCount
will be invoked with the remaining argument(input output). org.apache.hadoop.examples.WordCount is not getting called directly.
The usage of GenericOptionsParser enables to specify Generic option in the command line itself
Eg: With Genericoption you can specify the following
hadoop jar /home/hduser/WordCount/wordcount.jar WordCount -Dmapred.reduce.tasks=20 input output
Upvotes: 2
Reputation: 4681
Look at the help for the command hadoop jar
RunJar jarFile [mainclass] args...
So the full command would look like:
hadoop jar jarFile [mainclass] args...
When you run
hadoop jar /home/hduser/WordCount/wordcount.jar WordCount input output
This would mean that:
The mainclass
is the Class that contains the static void run main(Strings[] arg)
method inside of the jar that you would like to run
Upvotes: 0
Reputation: 900
You are executing jar file via Hadoop Jar command. If you look at the syntax: hadoop jar [mainClass] args
So for your command jar_name = hadoop jar [mainClass] args MainClass = WordCount {This is the name of the class that contains your main function. Please note this not the arguement. This is not an actual argument to your program but a hint that which class contains your main function. input = is your arguement output is also your arguement.
Upvotes: 1