The_Tourist
The_Tourist

Reputation: 2128

Hadoop GenericOptionsParser

I'm running the classic hadoop word count program and couldn't really figure out how GenericOptionsParser works in the following scenario.

String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();

Command to run the word count program:

hadoop jar /home/hduser/WordCount/wordcount.jar WordCount input output

From the above command, GenericOptionsParser picks up input as otherArgs[0] and output as otherArgs[1]. Why doesn't it pick up WordCount as an argument? How does it exactly work??

I've looked at the GenericOptionsParser source code from hadoop utils but couldn't make much sense of it. Any guidance would be really helpful...

Upvotes: 4

Views: 8017

Answers (4)

Naga
Naga

Reputation: 1253

Command usage is already explained.

The functionality of GenericOptionsParser is to segregate the generic options from user command line args like input, output, other options. Hadoop offers the following generic options.

-D key=value
-fs
-jt
-libjars
-files etc....

This class is not only segregates generic options from user command line arguments but also add all these generic options to Hadoop configuration object which is created in the driver method of MR program.

We can use Tool and ToolRunner instead of GenericOptionsParser.

Upvotes: 2

SachinJose
SachinJose

Reputation: 8522

If the jar you are using here(wordcount.jar) is hadoop-examples*.jar, then it is a runnable jar having main class org.apache.hadoop.examples.ExampleDriver

First argument is filtered out, if the example name (wordcount,teragen,terasort) which we specify is a valid option( teragen,terasort,wordcount etc.).

See the following method

org.apache.hadoop.util.ProgramDriver#driver(String[] args) 

After the initial filtering example class org.apache.hadoop.examples.WordCount will be invoked with the remaining argument(input output). org.apache.hadoop.examples.WordCount is not getting called directly.

The usage of GenericOptionsParser enables to specify Generic option in the command line itself

Eg: With Genericoption you can specify the following

hadoop jar /home/hduser/WordCount/wordcount.jar WordCount -Dmapred.reduce.tasks=20 input output

Upvotes: 2

FuriousGeorge
FuriousGeorge

Reputation: 4681

Look at the help for the command hadoop jar

RunJar jarFile [mainclass] args...

So the full command would look like:

hadoop jar jarFile [mainclass] args...

When you run

hadoop jar /home/hduser/WordCount/wordcount.jar WordCount input output

This would mean that:

  • jarFile = /home/hduser/WordCount/wordcount.jar
  • [mainclass] = WordCount
  • args... = input output

The mainclass is the Class that contains the static void run main(Strings[] arg) method inside of the jar that you would like to run

Upvotes: 0

Tanveer
Tanveer

Reputation: 900

You are executing jar file via Hadoop Jar command. If you look at the syntax: hadoop jar [mainClass] args

So for your command jar_name = hadoop jar [mainClass] args MainClass = WordCount {This is the name of the class that contains your main function. Please note this not the arguement. This is not an actual argument to your program but a hint that which class contains your main function. input = is your arguement output is also your arguement.

Upvotes: 1

Related Questions