Reputation: 21
I have a problem configuring a coordinator with oozie in a yarn cluster, it's an spark job, when I run the workflow by console the job is launched and executed correctly by the yarn, but when i call the same workflow from an coordinator.xml i have this error:
ERROR org.apache.spark.SparkContext - Error initializing SparkContext.
java.lang.IllegalArgumentException: Can not create a Path from an empty string
at org.apache.hadoop.fs.Path.checkPathArg(Path.java:127)
at org.apache.hadoop.fs.Path.<init>(Path.java:135)
at org.apache.hadoop.fs.Path.<init>(Path.java:94)
at org.apache.spark.deploy.yarn.Client.copyFileToRemote(Client.scala:337)
And the job never is launched in the yarn cluster, looks like yarn can't receive the .jar correct path from oozie, any idea?
Here the coordinator.xml and the workflow.xml simplified.
<coordinator-app name="Firebase acquisition process coordinator" frequency="${coord:days(1)}"
start="${startTime}" end="${endTime}" timezone="UTC" xmlns="uri:oozie:coordinator:0.5">
<controls>
...
</controls>
<action>
<workflow>
<app-path>hdfs://ip-111-11-11-111.us-west- 2.compute.internal:8020/user/hadoop/emr-spark/</app-path>
</workflow>
</action>
</coordinator-app>
<workflow-app name="bbbbbbbbbbbbbbb" xmlns="uri:oozie:workflow:0.5">
<start to="spark-0324"/>
<kill name="Kill">
<message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<action name="spark-0324">
<spark xmlns="uri:oozie:spark-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<master>yarn</master>
<mode>client</mode>
<class>classsxxx.Process</class>
<jar>hdfs://ip-111-11-11-111.us-west-2.compute.internal:8020/user/hadoop/emr-spark/lib/jarnamex.jar</jar>
<file>lib#lib</file>
</spark>
<ok to="End"/>
<error to="Kill"/>
</action>
<end name="End"/>
</workflow-app>
I mean, when I do this; oozie job -config ~/emr-spark/job.properties -run it works!!, but when I try this; oozie job -run -config ~/emr-coordinator/coordinator.properties It doesn't work.
job properties
oozie.use.system.libpath=true
send_email=False
dryrun=False
nameNode=hdfs://ip-111-11-11-111.us-west-2.compute.internal:8020
jobTracker=ip-111-11-11-111.us-west-2.compute.internal:8032
oozie.wf.application.path=/user/hadoop/emr-spark
coordinator properties
startTime=2017-09-08T19:46Z
endTime=2030-01-01T06:00Z
jobTracker=ip-111-11-11-111.us-west-2.compute.internal:8032
nameNode=hdfs://ip-111-11-11-111.us-west-2.compute.internal:8020
oozie.coord.application.path=hdfs://ip-111-11-11-111.us-west-2.compute.internal:8020/user/hadoop/emr-coordinator
oozie.use.system.libpath=true
Upvotes: 2
Views: 1024
Reputation: 21
Referring to resource from the HDFS file system it has to be relative only. The full/absolute path is computed on demand.
Then the solution was just replace: hdfs://ip-111-11-11-111.us-west-2.compute.internal:8020/user/hadoop/emr-spark/workflow.xml with: /user/hadoop/emr-spark/workflow.xml and hdfs://ip-111-11-11-111.us-west-2.compute.internal:8020/user/hadoop/emr-spark/lib/xxxx.jar with /user/hadoop/emr-spark/lib/xxxxx.jar
In the workflow.xml, coordinator.xml or properties.
Upvotes: 0