swish
swish

Reputation: 367

Running an Oozie job

I'm trying to configure Oozie to work on my hadoop-2.7.1 cluster. Everything seems to work fine, YARN, Hue, MapReduce and Spark. Jobs send by yarn jar... command finish correctly, but sending some job with oozie, either by CLI oozie job ... -run or by Hue, the job is stuck at 33% and node logs show this:

2015-11-06 06:08:56,121 INFO [main] org.apache.hadoop.yarn.client.RMProxy: Connecting to ResourceManager at localhost/127.0.0.1:18030
2015-11-06 06:08:57,165 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:18030. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
...

I don't use 18030 port anywhere in my configuration, probably I should change its hostname from localhost to the network hostname. But where do I configure it? I've tried to change yarn.resourcemanager.scheduler.address, but that wasn't it.

EDIT: I run oozie job -config examples/apps/shell/job.properties -run with job.properties containing:

nameNode=hdfs://master:8020
jobTracker=master:8032
queueName=default
examplesRoot=examples
oozie.libpath=/data/shared/hadoop-2.7.1/etc/hadoop

oozie.wf.application.path=${nameNode}/user/${user.name}/${examplesRoot}/apps/shell

Upvotes: 1

Views: 2363

Answers (1)

Manjunath Ballur
Manjunath Ballur

Reputation: 6343

The error is occurring while trying to contact the Resource Manager.

The above mentioned log line is being printed in RMProxy.java:

LOG.info("Connecting to ResourceManager at " + rmAddress);

When you are using Oozie with MRv1, in "job.properties" file, the value of jobTracker is set to the Job Tracker's address:

jobTracker={JobTracker Host}:{JobTracker Port}

But, when you migrate your Oozie job to MRv2, you need to change "job.properties", to make jobTracker value to point to Resource Manager address:

jobTracker={RM Host}:{RM Port}

Please refer to the link here: https://support.pivotal.io/hc/en-us/articles/203355837-How-to-run-a-MapReduce-jar-using-Oozie-workflow

jobTracker = Variable to define the resource manager address in case of Yarn implementation. Format: <resourcemanager_hostname>:<port>

EDIT: I went through the Hadoop source code. The only place where port "18030" is being used is in "SLS" (Yarn Scheduler Load Simulator).

SLS has a yarn-site.xml file (present at location: \hadoop-tools\hadoop-sls\src\main\sample-conf\yarn-site.xml), which has following configuration:

  <property>
    <description>The address of the scheduler interface.</description>
    <name>yarn.resourcemanager.scheduler.address</name>
    <value>localhost:18030</value>
  </property>

From your description, it seems the yarn-site.xml that is being used, is similar to the one used by SLS.

Upvotes: 1

Related Questions