yari
yari

Reputation: 99

Efficient Way to take first n sorted elements in Spark Partitions

I have created an RDD from array in Spark. I want to take n smallest elements from on each partition. I have sorted iterator at each partition every time and take first n elements and replaces them with elements of arr1. The way i have done is

 var arr = (1 to 50000).toArray
 val n = 50
 val iterations = 100  
 val r = new Random() 
 val arr1 = Array.fill(n)(r.nextInt(10)) 

 val rdd = sc.parallelize(arr,3)
 rdd.mapPartitionsWithIndex{(index , it) =>  
 it=it.sortWith(_<_)

  for(i<- 0 until n){
   it(i) = arr1(i)   
  }
  it
 }

I want to ask is there any efficient way to perform same task in Scala

Upvotes: 2

Views: 568

Answers (1)

undefined_variable
undefined_variable

Reputation: 6218

rdd.sortBy(x=>x)
.foreachPartition(y=>println(y.take(n).toList))

Replace println with your use case

Upvotes: 1

Related Questions