How to deploy TypeSafe Activator based application to an Apache Spark cluster?

Question

My application uses Apache Spark for background data processing and Play Framework for the front end interface.

The best method to use the Play Framework in a Scala application to use it with TypeSafe activator.

Now, the problem is that I want to deploy this application to a spark cluster. There is good documentation as to how a person can deploy an SBT application to a cluster using spark-submit, but what to do with an activator based application?

Please note that I understand how to use Spark with activator using this link, my question is specifically about deploying the application on a cluster such as EC2 etc.

The application, by the way, is written in Scala.

I'm open to suggestions such as decoupling the two applications and allowing them to interact. Except I don't know how to do that, so if you're suggesting that a reference would be very much appreciated.

Update:

I have tried adding dependencies to build.sbt file in an activator project and I get the following error:

[info] Resolving org.fusesource.jansi#jansi;1.4 ...
[error] impossible to get artifacts when data has not been loaded. IvyNode = org.slf4j#slf4j-api;1.6.1
[trace] Stack trace suppressed: run last *:update for the full output.
[error] (*:update) java.lang.IllegalStateException: impossible to get artifacts when data has not been loaded. IvyNode = org.slf4j#slf4j-api;1.6.1

Here is how I added dependencies in build.sbt file:

// All the apache spark dependencies
libraryDependencies ++= Seq(
  "org.apache.spark" % "spark-core_2.10" % sparkVersion % "provided" withSources(),
  "org.apache.spark" % "spark-sql_2.10" % sparkVersion % "provided" withSources(),
  "org.apache.spark" % "spark-streaming_2.10" % sparkVersion % "provided" withSources(),
  "org.apache.spark" % "spark-mllib_2.10" % sparkVersion % "provided" withSources()
)

and the resolvers:

// All the Apache Spark resolvers
resolvers ++= Seq(
  "Apache repo" at     "https://repository.apache.org/content/repositories/releases",
  "Local Repo" at Path.userHome.asFile.toURI.toURL + "/.m2/repository", // Added local repository
  Resolver.mavenLocal )

Any workaround?

Chetan Bhasin · Accepted Answer

It turns out that the one problem with Play framework and Apache Spark is a dependency conflict which can be easily resolved by excluding the dependency from the Spark dependency list.

// All the apache spark dependencies
libraryDependencies ++= Seq(
  "org.apache.spark" % "spark-core_2.10" % sparkVersion % "provided" withSources() excludeAll(
    ExclusionRule(organization = "org.slf4j")
    ),
  "org.apache.spark" % "spark-sql_2.10" % sparkVersion % "provided" withSources(),
  "org.apache.spark" % "spark-streaming_2.10" % sparkVersion % "provided" withSources(),
  "org.apache.spark" % "spark-mllib_2.10" % sparkVersion % "provided" withSources()
)

Also, to be used in console, one can easily add the following to build.sbt file in order to have the basic spark packages imported directly.

/// console

// define the statements initially evaluated when entering 'console', 'consoleQuick', or 'consoleProject'
// but still keep the console settings in the sbt-spark-package plugin

// If you want to use yarn-client for spark cluster mode, override the environment variable
// SPARK_MODE=yarn-client 
val sparkMode = sys.env.getOrElse("SPARK_MODE", "local[2]")


initialCommands in console :=
  s"""
     |import org.apache.spark.SparkConf
     |import org.apache.spark.SparkContext
     |import org.apache.spark.SparkContext._
     |
     |@transient val sc = new SparkContext(
     |  new SparkConf()
     |    .setMaster("$sparkMode")
                                  |    .setAppName("Console test"))
                                  |implicit def sparkContext = sc
                                  |import sc._
                                  |
                                  |@transient val sqlc = new org.apache.spark.sql.SQLContext(sc)
                                  |implicit def sqlContext = sqlc
                                  |import sqlc._
                                  |
                                  |def time[T](f: => T): T = {
                                  |  import System.{currentTimeMillis => now}
                                  |  val start = now
                                  |  try { f } finally { println("Elapsed: " + (now - start)/1000.0 + " s") }
                                  |}
                                  |
                                  |""".stripMargin

cleanupCommands in console :=
  s"""
     |sc.stop()
   """.stripMargin

Now, the major issue is deployment of the application. By running play framework, launching of the application of multiple nodes on a cluster is troublesome since the HTTP request handler must have one specific URL. This problem can be solved by starting the Play Framework instance on master node and have the URL pointed to its IP.

How to deploy TypeSafe Activator based application to an Apache Spark cluster?

Answers (2)

Related Questions