Reputation: 486
How do you use Play Framework and a Spark cluster in development?
I can run any Spark app with the master set to local[*]
But if I set it to run on the cluster, I get this:
play.api.Application$$anon$1: Execution exception[[SparkException: Job aborted due to stage failure: Task 1 in stage 0.0 failed 4 times, most recent failure: Lost task 1.3 in stage 0.0 (TID 5, 192.168.1.239): java.lang.ClassNotFoundException: controllers.Application$$anonfun$test$1$$anonfun$2
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:67)
at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1620)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1521)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1781)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2018)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1942)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2018)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1942)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2018)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1942)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:373)
at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:86)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
I understand the problem is that the distributed workers don't have my app classes loaded.
So how do you use Spark under Lightbend Activator? It doesn't make any sense to submit a Play Framework app via the command line, it's supposed to run under Play so you can see the results in the browser.
I downloaded the Lightbend sample Spark applications and they use local[*] for the Spark Master. If I switch to the spark://master:port url they all crash with the same problem.
Does anyone know how to fix this? Thanks in advance.
Upvotes: 1
Views: 860
Reputation: 486
I'm sorry, folks. This is explained right in the documentation.
Under Advanced Dependency Management section, it explains how the master will distribute JARs to slave workers.
From there it was a matter of translating the --jars command line option to .addJar on a SparkContext.
Generate the jar via activator dist, it'll be under target/scala-2.version and then add the path to that file via addJars.
Works perfectly now.
Only problem is that under development, Play will restart the application when you change files, using the same JVM, which will generate a Spark error of having two contexts in one JVM. So you need to restart the app cold in order to test changes. Minor nuisance, considering the power of Spark under Play. Cheers!
Upvotes: 2