Reputation: 779
I want to find the most occurring value per group.
I tried using top(k)(column) but I get below error: Column class is not under aggregate function and not in GROUP BY.
For example: If I have table test_date with columns(pid, value)
pid, value
----------
1,a
1,b
1,a
1,c
I want result :
pid, value
----------
1,a
I tried SELECT pid,top(1)(value) top_value FROM test_data group by pid
I get the error:
Column value is not under aggregate function and not in GROUP BY
I also tried with anyHeavy()
but it only works for values that occur more than in half the cases
Upvotes: 3
Views: 2267
Reputation: 15226
This query should help you:
SELECT
pid,
/*
Decompose the query in parts:
1. groupArray((value, count)): convert the group of rows with the same 'pid' to the array of tuples (value, count)
2. arrayReverseSort: make reverse sorting by 'count' ('x.2' is 'count')
3. [1].1: take the 'value' from the first item of the sorted array
*/
arrayReverseSort(x -> x.2, groupArray((value, count)))[1].1 AS value
FROM
(
SELECT
pid,
value,
count() AS count
FROM test_date
GROUP BY
pid,
value
)
GROUP BY pid
ORDER BY pid ASC
Upvotes: 6