Reputation: 35
By default, Oozie shared lib directory provides libraries for Hive, Pig, and Map-Reduce. If I want to run Spark job on Oozie, it might be better to add Spark lib jars to Oozie's shared lib instead of copy them to app's lib directory.
How can I add Spark lib jars (including spark-core and its dependencies) to Oozie's shared lib? Any comment / answer is appreciated.
Upvotes: 3
Views: 5179
Reputation: 1074
Spark action is scheduled to be released with Oozie 4.2.0, even though the doc seems to be a bit behind. See related JIRA here : Oozie JIRA - Add spark action executor
Cloudera's release CDH 5.4 has it already though, see official doc here: CDH 5.4 oozie doc - Oozie Spark Action Extension
With the older version of Oozie, the jars could be shared with various approaches. The first approach may work the best. The complete listings anyway :
Below are the various ways to include a jar with your workflow:
Set oozie.libpath=/path/to/jars,another/path/to/jars in job.properties.
This is useful if you have many workflows that all need the same jar; you can put it in one place in HDFS and use it with many workflows. The jars will be available to all actions in that workflow. There is no need to ever point this at the ShareLib location. (I see that in a lot of workflows.) Oozie knows where the ShareLib is and will include it automatically if you set oozie.use.system.libpath=true in job.properties.
Create a directory named “lib” next to your workflow.xml in HDFS and put jars in there.
This is useful if you have some jars that you only need for one workflow. Oozie will automatically make those jars available to all actions in that workflow.
Specify the tag in an action with the path to a single jar; you can have multiple tags.
This is useful if you want some jars only for a specific action and not all actions in a workflow. The downside is that you have to specify them in your workflow.xml, so if you ever need to add/remove some jars, you have to change your workflow.xml.
Add jars to the ShareLib (e.g. /user/oozie/share/lib/lib_/pig)
While this will work, it’s not recommended for two reasons: The additional jars will be included with every workflow using that ShareLib, which may be unexpected to those workflows and users. When upgrading the ShareLib, you’ll have to recopy the additional jars to the new ShareLib.
quoted from Rober Kanter's blog here : How-to: Use the ShareLib in Apache Oozie (CDH 5)
Upvotes: 2