spark scala dataframe groupBy and orderBy

Question

I have a requirement to count the number of occurrences of pair in the first column and second column and sort in descending order. if there is a tie in the count, list the pair with the lowest number in the second column first.

the below works, except for the tie-breaker part. the first row should be 1,2,3 bc in _c1 2 is smaller than 4 and they both have the same count. how do i order by count desc and c2 asc?

new_df.groupBy($"_c0",$"_c1").count().orderBy($"count".desc).limit(10).show()

+---+---+-----+
|_c0|_c1|count|
+---+---+-----+
|  1|  4|    3|
|  1|  2|    3|
|  4|  1|    2|
|  3|  1|    2|
|  3|  4|    2|
|  2|  1|    2|
|  2|  4|    1|
|  1|  7|    1|
|  7|  2|    1|
|  2|  7|    1|
+---+---+-----+

MorleyP · Accepted Answer

Try adding count by Desc, and _c2 by asc to the order by clause.

new_df.groupBy($"_c0",$"_c1").count().orderBy($"count".desc, $"c2".asc).limit(10).show()

Do this in the order that you want the rules to be applied. in the above example, it will be ordered by count first then c2

spark scala dataframe groupBy and orderBy

Answers (1)

Related Questions