Reputation: 303
I'm running MapReduce jobs using oozie. From workflow i'm just invoking MapReduce driver class and nothing other than that. But for this oozie workflow takes lot of memory. It needs minimum of 2GB container size to invoke the driver class. Below is workflow.xml
<?xml version="1.0" encoding="utf-8"?>
<workflow-app xmlns="uri:oozie:workflow:0.4" name="My Job">
<start to="start-job" />
<action name='start-job'>
<shell xmlns="uri:oozie:shell-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${jobQueue}</value>
</property>
</configuration>
<exec>${jobScript}</exec>
<argument>${arguments}</argument>
<argument>${queueName}</argument>
<argument>${wf:id()}</argument>
<file>myPath/MyDriver.sh#MyDriver.sh</file>
</shell>
<ok to="end" />
<error to="kill" />
</action>
<kill name="kill">
<message>Job failed
failed:[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end" />
My shell script will look like below(MyDriver.sh),
hadoop jar myJar.jar MyDriverClass $1 $2 $3
Why oozie takes so much memory. How to reduce memory consumption of oozie?
Upvotes: 0
Views: 1297
Reputation: 4702
Shell action will start at least 2 mappers to run your java class.
You can avoid this using java action. Put your jar inside ${workflow-path}/lib/ directory and change your workflow:
<action name='start-job'>
<java>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${jobQueue}</value>
</property>
</configuration>
<main-class>MyDriverClass</main-class>
<arg>${arguments}</arg>
<arg>${queueName}</arg>
<arg>${wf:id()}</arg>
</java>
<ok to="end" />
<error to="kill" />
</action>
Upvotes: 1