Reputation: 2661
Can someone tell me why I don't see exact same counts printed in these two cases:
val count = myRdd.count()
println("Count from count() function: " + count)
var counters: Map[String, Int] = Map()
myRdd.foreach(i => {
counters = counters.updated(i, counters.getOrElse(i, 0) + 1)
})
counters.foreach(i => {
println("My key: " + i._1 + " Count: " + i._2)
})
Just trying to understand the 'map.updated' function.
Also, if I change 'counters' to:
var counters: Map[String, Long] = Map()
I get compiler error:
Error:(58, 65) type mismatch;
found : Long(1L)
required: String
counters = counters.updated(i, counters.getOrElse(i, 0) + 1L)
Why do I get this compiler error? Why is it saying that 'String' is required?
Upvotes: 0
Views: 130
Reputation: 232
You cannot have mutable variables (var counters) shared across workers. You will end with a concurrency issue.
I suggest to refer to basic word count exercise as a starting point
Something like this
myrdd.map(lambda word: (word, 1)).reduceByKey(lambda a, b: a + b)
This will give the intermediate result of your map function, is this your expected result?
Upvotes: 2