Reputation: 1528
I have situation where my map reduce job is dependent on third party libraries like hive-hcatalog-xxx.jar. I am running all my jobs through oozie. Mapreduce jobs are run via java action. What is the best way to include third party libraries in my job? I have two options in hand
Bundle all the dependent jars into the main jar and create a fat jar.
Keep all the dependent jars in an HDFS location and add it via -libjars option
Which one I can choose? Please advice.
As my mapreduce job is invoked through a java action of oozie, the libraries available in oozie lib folder is not added to the classpath of mapper/reducer. If I change this java action to map reduce action, will the jars be available?
Thanks in advance.
Upvotes: 1
Views: 885
Reputation: 13402
You can obviously adopt the approaches suggested by you, But Oozie has sharelib prepared for hcatalog
. You can use them out of the box with oozie.action.sharelib.for.actiontype
property in your job.properties
. For the java
action you can specify:
oozie.action.sharelib.for.java=hcatalog
This will load the libraries from the oozie share lib hcatalog
into your launcher job. This should do the job.
You can checkout the content of the hcatalog
here:
hdfs dfs -ls /user/oozie/share/lib/lib_*/hcatalog
Upvotes: 1
Reputation: 29165
1.Bundle all the dependent jars into the main jar and create a fat jar. OR 2.Keep all the dependent jars in an HDFS location and add it via -libjars option Which one I can choose?
Although, both approaches are in practice. I'd suggest Uber jar i.e your first approach.
Uber jar : A jar that has a lib/
folder inside which carries more dependent jars (a structure known as 'uber' jars), and you submit the job via a regular 'hadoop jar' command, these lib/.jars get picked up by the framework because the supplied jar is specified explicitly via conf.setJarByClass
or conf.setJar
. That is, if this user uber jar goes to the JT as the mapred...jar, then it is handled by the framework properly and the lib/.jars are all considered and placed on the classpath
.
Why
The advantage is that you can distribute your uber-jar and not care at all whether or not dependencies are installed at the destination, as your uber-jar actually has no dependencies.
As my mapreduce job is invoked through a java action of oozie, the libraries available in oozie lib folder is not added to the classpath of mapper/reducer. If I change this java action to map reduce action, will the jars be available?
For the above question, since answer is broad,
I have sharelib
links from CDH4.xx , CDH5.xx &
How to configure Mapreduce action with Oozie shre lib. for you
Upvotes: 1