Reputation: 49
I am new to spark and probably I don't have a good understanding of reduce by key. I want to keep the cluster Id that is nearest to a point.
distancePointMicrocluster: RDD[Point,(ClusterId: Int, Distance: Double)]
val nearestClusterToPoint = distancesPointMicrocluster.reduceByKey((x,y) => if (x._2 < y._2) x else y )
Input and Output of the function
Upvotes: 2
Views: 185
Reputation: 49
The problem wasn't the function reduceByKey
but the fact that I didn't save the points in memory. As a result I was recreating the points in every action and for that reason the pointIds
weren't the same.
Upvotes: 1