Ujjwal SIddharth
Ujjwal SIddharth

Reputation: 137

Schedule a scala file using oozie?

How do I run/schedule a .scala file which I run using spark-shell command using oozie?

I run this file using spark-shell -i combined.scala
command!

I was wondering if something specific like how we schedule pig jobs in oozie is present here!

As per David's suggestion I have created the xml:

<workflow-app xmlns='uri:oozie:workflow:0.2' name='oozie-java-spark-wf'>
   <start to='java-spark' />

   <action name='java-spark'>
    <spark xmlns="uri:oozie:spark-action:0.1">
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <master>yarn-cluster</master>
            <name>Spark Patent Citation</name>
            <class>org.apache.spark.repl.Main</class>
            <jar></jar>
            <arg>-i</arg>
            <arg>${nameNode}/user/hdfs/scala_file/combined.scala</arg>
</spark>


    <ok to="end"/>
    <error to="fail"/>
    </action>

    <kill name="fail">
      <message>Spark Java PatentCitation failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
    </kill>
    <end name="end"/>
</workflow-app>

I am getting an error like this:

Error Code JA018 Error Message Main class [org.apache.oozie.action.hadoop.SparkMain], main() threw exception, null

Where am I going wrong?

Upvotes: 1

Views: 3098

Answers (1)

David Griffin
David Griffin

Reputation: 13927

There is a Spark Action for Oozie:

Oozie Spark Action

spark-shell is just a wrapper around org.apache.spark.repl.Main -- just specify that as the SPARK MAIN CLASS and pass in -i and combined.scala as <arg/> values.

Upvotes: 2

Related Questions