Reputation: 3591
My application uses Apache Spark for background data processing and Play Framework for the front end interface.
The best method to use the Play Framework in a Scala application to use it with TypeSafe activator.
Now, the problem is that I want to deploy this application to a spark cluster.
There is good documentation as to how a person can deploy an SBT application to a cluster using spark-submit
, but what to do with an activator based application?
Please note that I understand how to use Spark with activator using this link, my question is specifically about deploying the application on a cluster such as EC2 etc.
The application, by the way, is written in Scala.
I'm open to suggestions such as decoupling the two applications and allowing them to interact. Except I don't know how to do that, so if you're suggesting that a reference would be very much appreciated.
Update:
I have tried adding dependencies to build.sbt
file in an activator project and I get the following error:
[info] Resolving org.fusesource.jansi#jansi;1.4 ...
[error] impossible to get artifacts when data has not been loaded. IvyNode = org.slf4j#slf4j-api;1.6.1
[trace] Stack trace suppressed: run last *:update for the full output.
[error] (*:update) java.lang.IllegalStateException: impossible to get artifacts when data has not been loaded. IvyNode = org.slf4j#slf4j-api;1.6.1
Here is how I added dependencies in build.sbt file:
// All the apache spark dependencies
libraryDependencies ++= Seq(
"org.apache.spark" % "spark-core_2.10" % sparkVersion % "provided" withSources(),
"org.apache.spark" % "spark-sql_2.10" % sparkVersion % "provided" withSources(),
"org.apache.spark" % "spark-streaming_2.10" % sparkVersion % "provided" withSources(),
"org.apache.spark" % "spark-mllib_2.10" % sparkVersion % "provided" withSources()
)
and the resolvers:
// All the Apache Spark resolvers
resolvers ++= Seq(
"Apache repo" at "https://repository.apache.org/content/repositories/releases",
"Local Repo" at Path.userHome.asFile.toURI.toURL + "/.m2/repository", // Added local repository
Resolver.mavenLocal )
Any workaround?
Upvotes: 1
Views: 400
Reputation: 3591
It turns out that the one problem with Play framework and Apache Spark is a dependency conflict which can be easily resolved by excluding the dependency from the Spark dependency list.
// All the apache spark dependencies
libraryDependencies ++= Seq(
"org.apache.spark" % "spark-core_2.10" % sparkVersion % "provided" withSources() excludeAll(
ExclusionRule(organization = "org.slf4j")
),
"org.apache.spark" % "spark-sql_2.10" % sparkVersion % "provided" withSources(),
"org.apache.spark" % "spark-streaming_2.10" % sparkVersion % "provided" withSources(),
"org.apache.spark" % "spark-mllib_2.10" % sparkVersion % "provided" withSources()
)
Also, to be used in console, one can easily add the following to build.sbt
file in order to have the basic spark packages imported directly.
/// console
// define the statements initially evaluated when entering 'console', 'consoleQuick', or 'consoleProject'
// but still keep the console settings in the sbt-spark-package plugin
// If you want to use yarn-client for spark cluster mode, override the environment variable
// SPARK_MODE=yarn-client <cmd>
val sparkMode = sys.env.getOrElse("SPARK_MODE", "local[2]")
initialCommands in console :=
s"""
|import org.apache.spark.SparkConf
|import org.apache.spark.SparkContext
|import org.apache.spark.SparkContext._
|
|@transient val sc = new SparkContext(
| new SparkConf()
| .setMaster("$sparkMode")
| .setAppName("Console test"))
|implicit def sparkContext = sc
|import sc._
|
|@transient val sqlc = new org.apache.spark.sql.SQLContext(sc)
|implicit def sqlContext = sqlc
|import sqlc._
|
|def time[T](f: => T): T = {
| import System.{currentTimeMillis => now}
| val start = now
| try { f } finally { println("Elapsed: " + (now - start)/1000.0 + " s") }
|}
|
|""".stripMargin
cleanupCommands in console :=
s"""
|sc.stop()
""".stripMargin
Now, the major issue is deployment of the application. By running play framework, launching of the application of multiple nodes on a cluster is troublesome since the HTTP request handler must have one specific URL. This problem can be solved by starting the Play Framework instance on master node and have the URL pointed to its IP.
Upvotes: 0
Reputation: 8477
activator is just sbt with three changes:
So everything you read about sbt applies. You can also use sbt with your project if you like, but it's the same thing unless you are using "new" or "ui"
The short answer to your question is probably to use the sbt-native-packager plugin and its "stage" task; the play docs have a deployment section that describes this.
Upvotes: 1