Reputation: 99
I have created an RDD from array in Spark. I want to take n smallest elements from on each partition. I have sorted iterator at each partition every time and take first n elements and replaces them with elements of arr1. The way i have done is
var arr = (1 to 50000).toArray
val n = 50
val iterations = 100
val r = new Random()
val arr1 = Array.fill(n)(r.nextInt(10))
val rdd = sc.parallelize(arr,3)
rdd.mapPartitionsWithIndex{(index , it) =>
it=it.sortWith(_<_)
for(i<- 0 until n){
it(i) = arr1(i)
}
it
}
I want to ask is there any efficient way to perform same task in Scala
Upvotes: 2
Views: 568
Reputation: 6218
rdd.sortBy(x=>x)
.foreachPartition(y=>println(y.take(n).toList))
Replace println with your use case
Upvotes: 1