Reputation: 137
How do I run/schedule a .scala file which I run using spark-shell command using oozie?
I run this file using spark-shell -i combined.scala
command!
I was wondering if something specific like how we schedule pig jobs in oozie is present here!
As per David's suggestion I have created the xml:
<workflow-app xmlns='uri:oozie:workflow:0.2' name='oozie-java-spark-wf'>
<start to='java-spark' />
<action name='java-spark'>
<spark xmlns="uri:oozie:spark-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<master>yarn-cluster</master>
<name>Spark Patent Citation</name>
<class>org.apache.spark.repl.Main</class>
<jar></jar>
<arg>-i</arg>
<arg>${nameNode}/user/hdfs/scala_file/combined.scala</arg>
</spark>
<ok to="end"/>
<error to="fail"/>
</action>
<kill name="fail">
<message>Spark Java PatentCitation failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>
</workflow-app>
I am getting an error like this:
Error Code JA018 Error Message Main class [org.apache.oozie.action.hadoop.SparkMain], main() threw exception, null
Where am I going wrong?
Upvotes: 1
Views: 3098
Reputation: 13927
There is a Spark Action for Oozie:
spark-shell is just a wrapper around org.apache.spark.repl.Main
-- just specify that as the SPARK MAIN CLASS
and pass in -i
and combined.scala
as <arg/>
values.
Upvotes: 2