Reputation: 79
I have an RDD[(Int, Array(Double))] like
1, Array(2.0,5.0,6.3)
5, Array(1.0,3.3,9.5)
1, Array(5.0,4.2,3.1)
2, Array(9.6,6.3,2.3)
1, Array(8.5,2.5,1.2)
5, Array(6.0,2.4,7.8)
2, Array(7.8,9.1,4.2)
I want to sort the RDD according to the Distinct value in 1st column (1,5,2)
Required Output
1, Array(2.0,5.0,6.3)
1, Array(5.0,4.2,3.1)
1, Array(8.5,2.5,1.2)
5, Array(1.0,3.3,9.5)
5, Array(6.0,2.4,7.8)
2, Array(9.6,6.3,2.3)
2, Array(7.8,9.1,4.2)
I have tried with commands like
rdd.groupby()
rdd.sortby()
All this thing will yield output with sorted list like
1, Array(2.0,5.0,6.3)
1, Array(5.0,4.2,3.1)
1, Array(8.5,2.5,1.2)
2, Array(9.6,6.3,2.3)
2, Array(7.8,9.1,4.2)
5, Array(1.0,3.3,9.5)
5, Array(6.0,2.4,7.8)
How can I sort the RDD with distinct value is in 1st column by
(1,5,2)
Upvotes: 0
Views: 537
Reputation: 563
You can first define your ordering as in your example:
val ordering = (1,5,2).productIterator.toList.zipWithIndex.toMap
And then apply it:
rdd.sortBy{case (k,v) => ordering(k)}
Upvotes: 1