Reputation: 450
I have a table like this
type Total
A 100
A 123
A 154
A 50
A 54
B 200
B 166
B 423
B 342
B 213
C 520
C 130
C 234
C 512
I want to select the top 3 total by groups. How can i do it?
Upvotes: 5
Views: 12739
Reputation: 172964
In most [big data] use cases using ROW_NUMBER() is not fine as it ends up with resource exceeded error. This is because it requires all point of same group be present in same/one node which in case of data skew leads to above mentioned error in BigQuery
Option 1
One of the usual ways to address this issue is using ARRAY_AGG() function as it is in below below example
#standardSQL
SELECT type, total FROM (
SELECT type, ARRAY_AGG(total ORDER BY total DESC LIMIT 3) arr
FROM `project.dataset.table` GROUP BY type
), UNNEST(arr) total
If to run above against data example from your question
#standardSQL
WITH `project.dataset.table` AS (
SELECT 'A' type, 100 total UNION ALL
SELECT 'A', 123 UNION ALL
SELECT 'A', 154 UNION ALL
SELECT 'A', 50 UNION ALL
SELECT 'A', 54 UNION ALL
SELECT 'B', 200 UNION ALL
SELECT 'B', 166 UNION ALL
SELECT 'B', 423 UNION ALL
SELECT 'B', 342 UNION ALL
SELECT 'B', 213 UNION ALL
SELECT 'C', 520 UNION ALL
SELECT 'C', 130 UNION ALL
SELECT 'C', 234 UNION ALL
SELECT 'C', 512
)
SELECT type, total FROM (
SELECT type, ARRAY_AGG(total ORDER BY total DESC LIMIT 3) arr
FROM `project.dataset.table` GROUP BY type
), UNNEST(arr) total
-- ORDER BY type
you will get expected result as
Row type total
1 A 154
2 A 123
3 A 100
4 B 423
5 B 342
6 B 213
7 C 520
8 C 512
9 C 234
Option 2
But there is yet another interesting option to consider for really big data - to use APPROX_TOP_SUM() function as in below example
#standardSQL
SELECT type, value AS total FROM (
SELECT type, APPROX_TOP_SUM(total, total, 3) arr
FROM `project.dataset.table` GROUP BY type
), UNNEST(arr)
obviously, with the same output as above for sample data
Upvotes: 6
Reputation: 1269503
Row number is fine. A fun way to do this in BigQuery is:
select type,
array_agg(total order by total desc limit 3) as top3
from t
group by type;
This puts the values into an array.
Upvotes: 8
Reputation: 37473
You can try using row_number()
select * from
(
select type, total, row_number() over(partition by type order by total desc) as rn
from tablename
)A
where rn<=3
Upvotes: 5