Ayush Mishra
Ayush Mishra

Reputation: 593

Not able to add spark job on EC2 cluster

I am new to Spark. I am able to launch, manage and shut down Spark clusters on Amazon EC2 from http://spark.incubator.apache.org/docs/0.7.3/ec2-scripts.html.

But I am not able to add below job on cluster.

package spark.examples

import spark.SparkContext
import SparkContext._

object SimpleJob {

  def main(args: Array[String]) {
    val logFile = "< Amazon S3 file url>"
    val sc = new SparkContext(
      "spark://<Host Name>:7077", 
      "Simple Job",
      System.getenv("SPARK_HOME"), Seq("<Jar Address>")
    )
    val logData = sc.textFile(logFile)
    val numsa = logData.filter(line => line.contains("a")).count
    val numsb = logData.filter(line => line.contains("b")).count
    println("total a : %s, total b : %s".format(numsa, numsb))
  }

}

I have created a SimpleJob.scala and added in spark.examples package on my local spark directory. After that I run the command:

./spark-ec2 -k <keypair> -i <key-file> login <cluster-name>

Cluster is started and I am able to login in cluster. But I don't know how to add and run this job on EC2 cluster.

Upvotes: 1

Views: 1513

Answers (2)

Ambarish Hazarnis
Ambarish Hazarnis

Reputation: 186

If you are able to run locally, then, most probably the issue can be Spark workers are not able to access your jar. Let me know if the following steps work-

  1. Export your code into a jar file (I usually use Eclipse but you can use sbt too)

  2. Run the command at master as

    SPARK_CLASSPATH=<path/to/jar/file> ./run <Class> [arguements]
    

For example,

    SPARK_CLASSPATH=Simple.jar ./run spark.examples.SimpleJob

Also make sure your workers are alive from Spark master UI. Hope this helps!

Upvotes: 1

elyase
elyase

Reputation: 40973

I suggest you try first to run it locally, once you achieve that you will have a better idea of the process involved. Follow the instructions here in the section "A standalone job in Scala". Then copy the script to the remote machine and run the script from there with:

./run spark.examples.SimpleJob

IF you try to connect to your remote spark from the local script with:

MASTER=spark://ec2-174-129-181-44.compute-1.amazonaws.com:7077 ./run spark.examples.SimpleJob

the most probably result is that you will get a connection error as port 7077 is blocked by default in EC2.

Upvotes: 1

Related Questions