zelda26
zelda26

Reputation: 489

spark scala dataframe groupBy and orderBy

I have a requirement to count the number of occurrences of pair in the first column and second column and sort in descending order. if there is a tie in the count, list the pair with the lowest number in the second column first.

the below works, except for the tie-breaker part. the first row should be 1,2,3 bc in _c1 2 is smaller than 4 and they both have the same count. how do i order by count desc and c2 asc?

new_df.groupBy($"_c0",$"_c1").count().orderBy($"count".desc).limit(10).show()
+---+---+-----+
|_c0|_c1|count|
+---+---+-----+
|  1|  4|    3|
|  1|  2|    3|
|  4|  1|    2|
|  3|  1|    2|
|  3|  4|    2|
|  2|  1|    2|
|  2|  4|    1|
|  1|  7|    1|
|  7|  2|    1|
|  2|  7|    1|
+---+---+-----+

Upvotes: 0

Views: 2301

Answers (1)

MorleyP
MorleyP

Reputation: 58

Try adding count by Desc, and _c2 by asc to the order by clause.

new_df.groupBy($"_c0",$"_c1").count().orderBy($"count".desc, $"c2".asc).limit(10).show()

Do this in the order that you want the rules to be applied. in the above example, it will be ordered by count first then c2

Upvotes: 4

Related Questions