sunitha
sunitha

Reputation: 1528

third party jars in map reduce job

I have situation where my map reduce job is dependent on third party libraries like hive-hcatalog-xxx.jar. I am running all my jobs through oozie. Mapreduce jobs are run via java action. What is the best way to include third party libraries in my job? I have two options in hand

  1. Bundle all the dependent jars into the main jar and create a fat jar.

  2. Keep all the dependent jars in an HDFS location and add it via -libjars option

Which one I can choose? Please advice.

As my mapreduce job is invoked through a java action of oozie, the libraries available in oozie lib folder is not added to the classpath of mapper/reducer. If I change this java action to map reduce action, will the jars be available?

Thanks in advance.

Upvotes: 1

Views: 885

Answers (2)

YoungHobbit
YoungHobbit

Reputation: 13402

You can obviously adopt the approaches suggested by you, But Oozie has sharelib prepared for hcatalog. You can use them out of the box with oozie.action.sharelib.for.actiontype property in your job.properties. For the java action you can specify:

oozie.action.sharelib.for.java=hcatalog

This will load the libraries from the oozie share lib hcatalog into your launcher job. This should do the job.

You can checkout the content of the hcatalog here:

 hdfs dfs -ls /user/oozie/share/lib/lib_*/hcatalog

Upvotes: 1

Ram Ghadiyaram
Ram Ghadiyaram

Reputation: 29165

1.Bundle all the dependent jars into the main jar and create a fat jar. OR 2.Keep all the dependent jars in an HDFS location and add it via -libjars option Which one I can choose?

Although, both approaches are in practice. I'd suggest Uber jar i.e your first approach.

Uber jar : A jar that has a lib/ folder inside which carries more dependent jars (a structure known as 'uber' jars), and you submit the job via a regular 'hadoop jar' command, these lib/.jars get picked up by the framework because the supplied jar is specified explicitly via conf.setJarByClass or conf.setJar. That is, if this user uber jar goes to the JT as the mapred...jar, then it is handled by the framework properly and the lib/.jars are all considered and placed on the classpath.

Why

The advantage is that you can distribute your uber-jar and not care at all whether or not dependencies are installed at the destination, as your uber-jar actually has no dependencies.

As my mapreduce job is invoked through a java action of oozie, the libraries available in oozie lib folder is not added to the classpath of mapper/reducer. If I change this java action to map reduce action, will the jars be available?

For the above question, since answer is broad,

I have sharelib links from CDH4.xx , CDH5.xx & How to configure Mapreduce action with Oozie shre lib. for you

Upvotes: 1

Related Questions