Dhiraj
Dhiraj

Reputation: 3696

'collect' action not displaying result in driver window for Spark standalone application

I am using Spark 1.4.0 on my local system. Whenever I create an RDD and call collect on it through Scala shell of Spark, it works fine. But when I create a standalone application and call 'collect' action on the RDD, I don't see the result , despite the Spark messages during the run say that certain number of bytes have been set to driver:-

INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 1991 bytes result sent to driver
INFO Executor: Finished task 1.0 in stage 0.0 (TID 1). 1948 bytes result sent to driver

This is the code:-

object Test
{   def main(args:Array[String]) 
    {
        val conf = new SparkConf()      
        val sc = new SparkContext(conf)
        val rdd1 = sc.textFile("input.txt")
        val rdd2 = rdd1.map(_.split(",")).map(x=>( (x(0),x(1)) ))
        rdd2.collect
    }   

}

If I change the last statement to the following, it does display the result:-

rdd2.collect.foreach(println)

So the question is, why only calling 'collect' does not print anything?

Upvotes: 1

Views: 1226

Answers (1)

Justin Pihony
Justin Pihony

Reputation: 67075

collect by itself on a console app would not display anything as all it does is return the data. You have to do something to display it, as you are doing with the foreach(println). Or, do something with it in general, like saving it to disk.

Now, if you were to run that code in the spark-shell (minus SparkContext creation), then you would indeed see output* as the shell always calls the toString of the objects that are returned.

*Noting that toString is not the same as foreach(println) since the shell would truncate at some point

Upvotes: 2

Related Questions