Reputation: 2001
I have the following data:
yr char cnt
1 a 27
1 g 20
3 b 50
3 z 70
I like to get the year, only max count of cnt field. i.e,
yr char count
1 a 27
3 z 70
I tried to use a SQL like below:
SELECT yr, char, max(count(cnt)) as count
FROM view
GROUP BY yr,char
But it resulted in an error saying the max cannot be used with count in SparkSQL. How can I get the result I want as shown above?
Upvotes: 1
Views: 1498
Reputation: 668
This should work
sql("select a.yr, a.char, a.cnt from view a join (select yr, max(cnt) as cnt from view group by yr) b on a.yr = b.yr and b.cnt = a.cnt").show()
Upvotes: 1
Reputation: 1269753
This would often be done using row_number()
:
select yr, char, cnt
from (select yr, char, count(*) as cnt,
row_number() over (partition by yr order by count(*) desc) as seqnum
from view
group by yr, char
) yc
where seqnum = 1;
Note: In the event of ties, this returns an arbitrary one of them. If you want all of them, use rank()
or dense_rank()
.
Upvotes: 0