Reputation: 1
Here is the simplified code for the issue:
class TestClass extends Serializable {
val map=Map[String,String]();
private def addItem(s:String){
val sArr=s.split(",");
map(sArr(0))=sArr(1);
println("***TEST item added: "+sArr(0)+"->"+sArr(1));
println("***TEST map size: "+map.size);
}
def test(){
val itemsFile = spark.sparkContext.textFile("./items.txt");
val itemsFile = spark.sparkContext.textFile("./items.txt");
itemsFile.foreach( addItem(_) );
//problem:the output is 0 of the line below!
println("***TEST map size is "+map.size);
}
}
addItem() is to add(K,v)to the object's member variable "map"。test()is to read lines from a file(each line is (k,v) pair) to RDD, then process each line to add the according (k,v) to "map". enter image description here
when calling test(), we can see addItem() was called successfully for all the times and "map"'s size was increasing. But when executing the last "print()", the map became empty so the size is 0...
Actually the member variable "map" of the class instance (the object) isn't the same that we pass to "itemsFile.foreach()". But why? (I'm new to Spark.) And how can we use Spark RDD to process a member variable and keep the result after the processing?
Thanks very much!
Upvotes: 0
Views: 24
Reputation: 1
I found the reason: the actual running "itemsFile.foreach( addItem(_) )" is on executor(s), if one wants to print out the result, it should be collect() to send back to the driver.
Upvotes: 0