Reputation: 337
I have some problems understanding what kind of error Oozie returns to me. Explanation:
I created a very simple "job" in Oozie, the XML is that:
<workflow-app name="Massimiliano" xmlns="uri:oozie:workflow:0.5">
<start to="spark-2adf"/>
<kill name="Kill">
<message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<action name="spark-2adf">
<spark xmlns="uri:oozie:spark-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<master>local[*]</master>
<mode>client</mode>
<name>MySpark</name>
<class>org.XXX.SimpleApp</class>
<jar>${nameNode}/user/${wf:user()}//prova_spark/SimpleApp1.jar</jar>
</spark>
<ok to="End"/>
<error to="Kill"/>
</action>
<end name="End"/>
</workflow-app>
The job.properties is the follow:
nameNode=hdfs://10.203.17.90:8020
jobTracker=10.203.17.90:8021
master=local[*]
queueName=default
oozie.use.system.libpath=true
oozie.wf.application.path=${nameNode}/user/${user.name}/hdfs/user/oozie/share/lib/lib_20160628182408/spark
I tried more and more time to change all the parameters with absolutely no result.
The error that afflicts me is this:
Main class [org.apache.oozie.action.hadoop.SparkMain], exit code [101]
The name node is the Master node;
I don't know if oozie.wf.application.path
is sets correctly;
More detail of error:
hdfs://nameservice1/user/hdfs//prova_spark/SimpleApp1.jar
=================================================================
>>> Invoking Spark class now >>>
Intercepting System.exit(101)
<<< Invocation of Main class completed <<<
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SparkMain], exit code [101]
Oozie Launcher failed, finishing Hadoop job gracefully
Oozie Launcher, uploading action data to HDFS sequence file: hdfs://nameservice1/user/hdfs/oozie-oozi/0000117-160804173605999-oozie-oozi-W/spark-2adf--spark/action-data.seq
Oozie Launcher ends
The path hdfs://nameservice1/user/hdfs//prova_spark/SimpleApp1.jar
is correct! But I don't know where I have to look to resolve this problem.
Can you help me, please?
Upvotes: 3
Views: 3163
Reputation: 1298
I also encountered an similar issue, and it turn out that the jar path
<jar>${nameNode}/user/${wf:user()}//prova_spark/SimpleApp1.jar</jar>
shuold be your local path.
You don't need to put your spark jar into HDFS, just use the one on your linux system.
The solution solved my problem so I post it here.
Upvotes: 1
Reputation: 337
I have resolved in this manner: for issues that I really don't understand the spark job with Oozie doesn't works very well. I said "doesn't works very well" because all the errors that occur in syslog and stderr are very general (the description of the errors are very incomprehensible), so it is very difficult to resolve each problem and everytime you have to walk in the shadow to resolve your problem.
So, I have changed the approach and I used the shell job
, where I put this code:
d=`date +"%Y-%m-%d_%T" | sed 's/:/-/g'`
echo "START_TIMESTAMP=$d"
export HADOOP_USER_NAME=hdfs
spark-submit --master yarn --deploy-mode cluster --class org.XXX.TryApp TryApp.jar "/user/hue/oozie/workspaces/hue-oozie-1471949509.25"
In practice I have written the "middle solution", so I understand a more little bit Hadoop with Spark.
I lauched the spark job with yarn in cluster mode, and I pass to my jar the path of the file. In my scala-code, this is principal rows:
import ...
import org.apache.hadoop.fs.{ FileSystem, Path }
object TryApp {
def main(args: Array[String]) {
val conf = new SparkConf().setAppName("TryApp")
val sc = new SparkContext()
val sqlContext = new HiveContext(sc)
val fs = FileSystem.get(sc.hadoopConfiguration).getUri //hdfs://nameservice1
}
Now, given that the path is //hdfs://nameservice1
, it was very simple knowed the rest of the path, and I passed other piece through the variable args(0).
In Hue
interface, you have to specify 3 things:
action.sh
action.sh
, the second one is the file jar that we have to launch through Oozie.This works for me, and I think is better solution because even if you have some problem, the output of the error is very clear and you can correct your code or you job.
I hope to be helpful to someone!
Upvotes: 1
Reputation: 124
> Step 1. First capture spark and related jars used to execute. One way would be to execute with spark-submit at command line.
> Step 2. Create lib folder if not exists in the workflow path.
> Step 3. Place all the jars collected in step 1 in the lib folders
> Step 4. Run the workflow.
I think this should fix it. However, I would curious to know if it still didn't work.
Upvotes: 2