Reputation: 161
I'm using Spark 1.3.1 with Scala 2.10.4. I've tried a basic scenario which consists in parallelizing an array of 3 strings, and mapping them with a variable that I define in the driver.
Here is the code :
object BasicTest extends App {
val conf = new SparkConf().setAppName("Simple Application").setMaster("spark://xxxxx:7077")
val sc = new SparkContext(conf)
val test = sc.parallelize(Array("a", "b", "c"))
val a = 5
test.map(row => row + a).saveAsTextFile("output/basictest/")
}
This piece of code works in local mode, I do get a list :
a5
b5
c5
But on a real cluster, I get :
a0
b0
c0
I've tried with another code :
object BasicTest extends App {
def test(sc: SparkContext): Unit = {
val test = sc.parallelize(Array("a", "b", "c"))
val a = 5
test.map(row => row + a).saveAsTextFile("output/basictest/")
}
val conf = new SparkConf().setAppName("Simple Application").setMaster("spark://xxxxx:7077")
val sc = new SparkContext(conf)
test(sc)
}
This works in both cases.
I just need to understand the reasons in each case. Thanks in advance for any advice.
Upvotes: 3
Views: 928
Reputation: 67065
I believe that this relates to the use of App
. This was "resolved" for SPARK-4170 only in that it warns against using App
if (classOf[scala.App].isAssignableFrom(mainClass)) {
printWarning("Subclasses of scala.App may not work correctly. Use a main() method instead.")
}
Notes from the ticket:
this bug seems to be an issue with the way Scala has been defined and the differences between what happens at runtime versus compile time with respect to the way App leverages the delayedInit function.
Upvotes: 2