Reputation: 10497
E.g. I've got a hadoop word-count program(from internet) , WordCount.java:
public static class WordCount{
public static void main(String[] args)throws Exception{
....
Job job = Job.getInstance(new Configuration(), "word count");
job.setJarByClass(WordCount.class); //Why?
}
}
Ccompile it into a jar and submit it to yarn like this:
hadoop jar wordcount.jar WordCount [input-hdfs] [output-hdfs]
In this command, we have specified:
(1) jar name (2) class name
As long as
hadoop already know from its command line "WordCount" is the class name from wordcount.jar.
The public class of WordCount.java is always WordCount, this is java standard, right?
Then what's the point of calling
setJarByClass(WordCount.class)
Seems to me it's redundant. Why is this statement required? Thanks
Upvotes: 0
Views: 361
Reputation: 191701
You can have more than one main
method in a single JAR file, therefore the classname is necessary unless you add a manifest file to the JAR.
And your job.set
class doesn't need to be the same class with the main
method, but Hadoop can't otherwise automatically know which class you want to run for the job, therefore you need to set the class in the code as well
You could do something like Class.forName(args[2])
if you did want to get the class from the CLI, though
Upvotes: 1