Pangjiu
Pangjiu

Reputation: 21

Running Spark job in Oozie using Yarn-cluster

I had created a Spark Job using Oozie which configure run on yarn-cluster. The Spark program is written in Scala and is a very simple program which just initialize SparkContext, println("hello world") and stop the SparkContext.

Below is the workflow.xml:

<workflow-app name="My_Workflow" xmlns="uri:oozie:workflow:0.5">
    <start to="spark-0177"/>
    <kill name="Kill">
        <message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
    </kill>
    <action name="spark-0177">
        <spark xmlns="uri:oozie:spark-action:0.1">
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <master>yarn-cluster</master>
            <mode>cluster</mode>
            <name>MySpark</name>
              <class>com.test1</class>
            <jar>/user/hue/oozie/workspaces/tl_test/lib/testOozie1.jar</jar>
              <spark-opts>--executor-cores 2  --driver-memory 5g --num-executors 2 --executor-memory 5g</spark-opts>
        </spark>
        <ok to="End"/>
        <error to="Kill"/>
    </action>
    <end name="End"/>
</workflow-app>

However, i was getting the following error:

 Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SparkMain], main() threw exception, Can not create a Path from an empty string
    java.lang.IllegalArgumentException: Can not create a Path from an empty string
        at org.apache.hadoop.fs.Path.checkPathArg(Path.java:127)
        at org.apache.hadoop.fs.Path.<init>(Path.java:135)
        at org.apache.hadoop.fs.Path.<init>(Path.java:94)
        at org.apache.spark.deploy.yarn.Client.copyFileToRemote(Client.scala:191)
        at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$3.apply(Client.scala:254)
        at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$3.apply(Client.scala:248)
        at scala.collection.immutable.List.foreach(List.scala:318)
        at org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:248)
        at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:384)
        at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:102)
        at org.apache.spark.deploy.yarn.Client.run(Client.scala:623)
        at org.apache.spark.deploy.yarn.Client$.main(Client.scala:651)
        at org.apache.spark.deploy.yarn.Client.main(Client.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:497)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:569)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
        at org.apache.oozie.action.hadoop.SparkMain.runSpark(SparkMain.java:105)
        at org.apache.oozie.action.hadoop.SparkMain.run(SparkMain.java:96)
        at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:46)
        at org.apache.oozie.action.hadoop.SparkMain.main(SparkMain.java:40)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:497)
        at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:228)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
        at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runSubtask(LocalContainerLauncher.java:370)
        at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runTask(LocalContainerLauncher.java:295)
        at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.access$200(LocalContainerLauncher.java:181)
        at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler$1.run(LocalContainerLauncher.java:224)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

    Oozie Launcher failed, finishing Hadoop job gracefully

    Oozie Launcher, uploading action data to HDFS sequence file: hdfs://MYRNDSVRVM350:8020/user/oozie-oozi/0000084-150828094553499-oozie-oozi-W/spark-156b--spark/action-data.seq

    Oozie Launcher ends

Please help as i'm totally stuck already. Thanks.

Upvotes: 2

Views: 4978

Answers (2)

Tao Cheng
Tao Cheng

Reputation: 121

I meet the exactly same wired problem. It turns out that multiple spaces in <spark-opts> cause this un-informtive error. You have two spaces between --executor-cores 2 --driver-memory 5g.

Upvotes: 2

Erik Schmiegelow
Erik Schmiegelow

Reputation: 2759

Can not create a Path from an empty string <- Spark is trying to stage your code, but your Oozie workflow doesn't provide the jar path, hence the empty path exception.

Providing the jar path in Oozie will fix this

Upvotes: 1

Related Questions