Reputation: 1071
I have the following Play for Scala controller that wraps Spark. At the end of the method I close the context to avoid the problem of having more than one context active in the same JVM:
class Test4 extends Controller {
def test4 = Action.async { request =>
val conf = new SparkConf().setAppName("AppTest").setMaster("local[2]").
set("spark.executor.memory","1g");
val sc = new SparkContext(conf)
val rawData = sc.textFile("c:\\spark\\data.csv")
val data = rawData.map(line => line.split(',').map(_.toDouble))
val str = "count: " + data.count()
sc.close
Future { Ok(str) }
}
}
The problem that I have is that I don't know how to make this code multi-threaded as two users may access the same controller method at the same time.
UPDATE
What I'm thinking is to have N Scala programs receive messages through JMS (using ActiveMQ). Each Scala program would have a Spark session and receive messages from Play. The Scala programs will process requests sequentially as they read the queues. Does this make sense? Are there any other best practices to integrate Play and Spark?
Upvotes: 0
Views: 437
Reputation: 3863
I don't think is a good idea to execute Spark jobs from a REST api, if you just want to parallelize in your local JVM it doesn't make sense to use Spark since it is designed for distributed computing. It is also not design to be an operational database and it won't scale well when you execute several concurrent queries in the same cluster.
Anyway if you still want to execute concurrent spark queries from the same JVM you should probably use client mode to run the query in a external cluster. It is not possible to launch more than one session per JVM so I would suggest that you share the session in your service, close it just when you are finishing the service.
Upvotes: 0
Reputation: 3692
Its better just move spark context move to new object
object SparkContext{
val conf = new SparkConf().setAppName("AppTest").setMaster("local[2]").
set("spark.executor.memory","1g");
val sc = new SparkContext(conf)
}
Otherwise for every request new spark context is created according to your design and new JVM is started for each new spark context.
If we talk about best practices its really not good idea to use spark inside play project more better way is to create a micro service which have spark application and play application call this micro service these type of architecture is more flexible, scalable, robust.
Upvotes: 1