COLD ICE
COLD ICE

Reputation: 850

Storing each element from each RDD to a new List

I am trying to store each element from each rdd into a new list. I can print the elements but I could not store elements in list or even having a string variable.

The is the code below:

...
    var hashtags = joined_d.map(x => ((x._1, x._2._1._1, x._2._2, 
    x._2._1._4), 
    getHashTags(x._2._1._4))).
    transform(rdd => rdd.map{case (x, list) => if(list.length > 0) 
    list.map(k => (k, (x._1, x._2, x._3, x._4, 1))) 
    else List((x._1.toString, (x._1, x._2, x._3, x._4, 0))) })

Now when storing the elements like:

    val arr = new ArrayBuffer[String]();
    var hashtags_pair = hashtags.foreachRDD(rdd => 
    rdd.foreach(l => l.foreach(x =>  arr += x._1)))

Then printing the values out:

arr.foreach(println) // Not working

But when printing the values straight without storing it like:

var hashtags_pair = hashtags.foreachRDD(rdd => 
rdd.foreach(l => l.foreach(x => println(x._1))) // It's working

Upvotes: 1

Views: 391

Answers (1)

Avishek Bhattacharya
Avishek Bhattacharya

Reputation: 6984

No you can't store the output of a map in an array. The reason is the RDD is a distributed dataset and it executes the map operation in different executors in parallel. Now the driver sends only the closure of the map operation to the executors for execution.

Here the declared array variable is a local to the driver and it can't be send to all the executors.

Upvotes: 1

Related Questions