Execute Apache Spark (Scala) code in Bash script

Question

I am newbie to spark and scala. I wanted to execute some spark code from inside a bash script. I wrote the following code.

Scala code was written in a separate .scala file as follows.

Scala Code:

import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf

object SimpleApp {
  def main(args: Array[String]) {
    val conf = new SparkConf().setAppName("Simple Application")
    val sc = new SparkContext(conf)
    println("x="+args(0),"y="+args(1))
  }
}

This is the bash script that invokes the Apache-spark/scala code.

Bash Code

#!/usr/bin/env bash
Absize=File_size1
AdBsize=File_size2
for i in `seq 2 $ABsize`
do
    for j in `seq 2 $ADsize`
    do
        Abi=`sed -n ""$i"p" < File_Path1`
        Adj=`sed -n ""$j"p" < File_Path2`
        scala SimpleApp.scala $Abi $adj
    done
done

But then I get the following errors.

Errors:

error:object apache is not a member of package org
import org.apache.spark.SparkContext
          ^
error: object apache is not a member of package org
import org.apache.spark.SparkContext._
           ^
error: object apache is not a member of package org
import org.apache.spark.SparkConf
           ^
error: not found:type SparkConf
val conf = new SparkConf().setAppName("Simple Application")              ^
 error: not found:type SparkContext

The above code works perfectly if the scala file is written without any spark function (That is a pure scala file), but fails when there are apache-spark imports.

What would be a good way to run and execute this from bash script? Will I have to call spark shell to execute the code?

FaigB · Accepted Answer

set up spark with environment variable and run as @puhlen told with spark-submit -class SimpleApp simple-project_2.11-1.0.jar $Abi $adj

Execute Apache Spark (Scala) code in Bash script

Answers (1)

Related Questions