Vijay Innamuri
Vijay Innamuri

Reputation: 4372

Order by value in spark pair RDD

I have a spark pair RDD (key, count) as below

Array[(String, Int)] = Array((a,1), (b,2), (c,1), (d,3))

Using spark scala API how to get a new pair RDD which is sorted by value?

Required result: Array((d,3), (b,2), (a,1), (c,1))

Upvotes: 21

Views: 54425

Answers (2)

Nagaraj Vittal
Nagaraj Vittal

Reputation: 901

Sort by key and value in ascending and descending order

val textfile = sc.textFile("file:///home/hdfs/input.txt")
val words = textfile.flatMap(line => line.split(" "))
//Sort by value in descending order. For ascending order remove 'false' argument from sortBy
words.map( word => (word,1)).reduceByKey((a,b) => a+b).sortBy(_._2,false)
//for ascending order by value
words.map( word => (word,1)).reduceByKey((a,b) => a+b).sortBy(_._2)

//Sort by key in ascending order
words.map( word => (word,1)).reduceByKey((a,b) => a+b).sortByKey
//Sort by key in descending order
words.map( word => (word,1)).reduceByKey((a,b) => a+b).sortByKey(false)

This can be done in another way by applying sortByKey after swapping the key and value

//Sort By value by swapping key and value and then using sortByKey
val sortbyvalue = words.map( word => (word,1)).reduceByKey((a,b) => a+b)
val descendingSortByvalue = sortbyvalue.map(x => (x._2,x._1)).sortByKey(false)
descendingSortByvalue.toDF.show
descendingSortByvalue.foreach {n => {
val word=  n._1
val count = n._2
println(s"$word:$count")}}

Upvotes: 9

Gábor Bakos
Gábor Bakos

Reputation: 9100

This should work:

//Assuming the pair's second type has an Ordering, which is the case for Int
rdd.sortBy(_._2) // same as rdd.sortBy(pair => pair._2)

(Though you might want to take the key to account too when there are ties.)

Upvotes: 42

Related Questions