Reputation: 63062
There has been plenty of chatter about Spark 2.0 supporting multiple SparkContext
s. A configuration variable to support it has been around for much longer but not actually effective.
In $SPARK_HOME/conf/spark-defaults.conf
:
spark.driver.allowMultipleContexts true
Let's verify that property were recognized:
scala> println(s"allowMultiCtx = ${sc.getConf.get("spark.driver.allowMultipleContexts")}")
allowMultiCtx = true
Here is a small poc program for it:
import org.apache.spark._
import org.apache.spark.streaming._
println(s"allowMultiCtx = ${sc.getConf.get("spark.driver.allowMultipleContexts")}")
def createAndStartFileStream(dir: String) = {
val sc = new SparkContext("local[1]",s"Spark-$dir" /*,conf*/)
val ssc = new StreamingContext(sc, Seconds(4))
val dstream = ssc.textFileStream(dir)
val valuesCounts = dstream.countByValue()
ssc.start
ssc.awaitTermination
}
val dirs = Seq("data10m", "data50m", "dataSmall").map { d =>
s"/shared/demo/data/$d"
}
dirs.foreach{ d =>
createAndStartFileStream(d)
}
However Attempts to use When that capability are not succeeding:
16/08/14 11:38:55 WARN SparkContext: Multiple running SparkContexts detected
in the same JVM!
org.apache.spark.SparkException: Only one SparkContext may be running in
this JVM (see SPARK-2243). To ignore this error,
set spark.driver.allowMultipleContexts = true.
The currently running SparkContext was created at:
org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:814)
org.apache.spark.repl.Main$.createSparkSession(Main.scala:95)
Anyone have any insight on how to use the multiple contexts?
Upvotes: 1
Views: 524
Reputation: 63062
Per @LostInOverflow this feature will not be fixed. Here is info from that jira
SPARK-2243 Support multiple SparkContexts in the same JVM
https://issues.apache.org/jira/browse/SPARK-2243
Sean Owen added a comment - 16/Jan/16 17:35 You say you're concerned with over-utilizing a cluster for steps that don't require much resource. This is what dynamic allocation is for: the number of executors increases and decreases with load. If one context is already using all cluster resources, yes, that doesn't do anything. But then, neither does a second context; the cluster is already fully used. I don't know what overhead you're referring to, but certainly one context running N jobs is busier than N contexts running N jobs. Its overhead is higher, but the total overhead is lower. This is more an effect than a cause that would make you choose one architecture over another. Generally, Spark has always assumed one context per JVM and I don't see that changing, which is why I finally closed this. I don't see any support for making this happen.
Upvotes: 3