Reputation: 3696
I am using Spark 1.4.0 on my local system. Whenever I create an RDD and call collect on it through Scala shell of Spark, it works fine. But when I create a standalone application and call 'collect' action on the RDD, I don't see the result , despite the Spark messages during the run say that certain number of bytes have been set to driver:-
INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 1991 bytes result sent to driver
INFO Executor: Finished task 1.0 in stage 0.0 (TID 1). 1948 bytes result sent to driver
This is the code:-
object Test
{ def main(args:Array[String])
{
val conf = new SparkConf()
val sc = new SparkContext(conf)
val rdd1 = sc.textFile("input.txt")
val rdd2 = rdd1.map(_.split(",")).map(x=>( (x(0),x(1)) ))
rdd2.collect
}
}
If I change the last statement to the following, it does display the result:-
rdd2.collect.foreach(println)
So the question is, why only calling 'collect' does not print anything?
Upvotes: 1
Views: 1226
Reputation: 67075
collect
by itself on a console app would not display anything as all it does is return the data. You have to do something to display it, as you are doing with the foreach(println)
. Or, do something with it in general, like saving it to disk.
Now, if you were to run that code in the spark-shell
(minus SparkContext
creation), then you would indeed see output* as the shell always calls the toString
of the objects that are returned.
*Noting that toString
is not the same as foreach(println)
since the shell would truncate at some point
Upvotes: 2