Parvathy K
Parvathy K

Reputation: 79

Sort RDD according to a distinct value in a column

I have an RDD[(Int, Array(Double))] like

1, Array(2.0,5.0,6.3) 
5, Array(1.0,3.3,9.5)
1, Array(5.0,4.2,3.1)
2, Array(9.6,6.3,2.3)
1, Array(8.5,2.5,1.2)
5, Array(6.0,2.4,7.8)
2, Array(7.8,9.1,4.2)

I want to sort the RDD according to the Distinct value in 1st column (1,5,2)

Required Output

1, Array(2.0,5.0,6.3)
1, Array(5.0,4.2,3.1)
1, Array(8.5,2.5,1.2)
5, Array(1.0,3.3,9.5)
5, Array(6.0,2.4,7.8)
2, Array(9.6,6.3,2.3)
2, Array(7.8,9.1,4.2)

I have tried with commands like

rdd.groupby()
rdd.sortby()

All this thing will yield output with sorted list like

1, Array(2.0,5.0,6.3)
1, Array(5.0,4.2,3.1)
1, Array(8.5,2.5,1.2)
2, Array(9.6,6.3,2.3)
2, Array(7.8,9.1,4.2)
5, Array(1.0,3.3,9.5)
5, Array(6.0,2.4,7.8)

How can I sort the RDD with distinct value is in 1st column by

(1,5,2) 

Upvotes: 0

Views: 537

Answers (1)

Ben Horsburgh
Ben Horsburgh

Reputation: 563

You can first define your ordering as in your example:

val ordering = (1,5,2).productIterator.toList.zipWithIndex.toMap

And then apply it:

rdd.sortBy{case (k,v) => ordering(k)}

Upvotes: 1

Related Questions