Key Jun
Key Jun

Reputation: 450

Bigquery: Select top 3 with Group By condition

I have a table like this

type    Total
 A       100
 A       123
 A       154
 A       50
 A       54
 B       200
 B       166
 B       423
 B       342
 B       213
 C       520
 C       130
 C       234
 C       512

I want to select the top 3 total by groups. How can i do it?

Upvotes: 5

Views: 12739

Answers (3)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 172964

In most [big data] use cases using ROW_NUMBER() is not fine as it ends up with resource exceeded error. This is because it requires all point of same group be present in same/one node which in case of data skew leads to above mentioned error in BigQuery

Option 1

One of the usual ways to address this issue is using ARRAY_AGG() function as it is in below below example

#standardSQL
SELECT type, total FROM (
  SELECT type, ARRAY_AGG(total ORDER BY total DESC LIMIT 3) arr
  FROM `project.dataset.table` GROUP BY type
), UNNEST(arr) total

If to run above against data example from your question

#standardSQL
WITH `project.dataset.table` AS (
  SELECT 'A' type, 100 total UNION ALL
  SELECT 'A', 123 UNION ALL
  SELECT 'A', 154 UNION ALL
  SELECT 'A', 50 UNION ALL
  SELECT 'A', 54 UNION ALL
  SELECT 'B', 200 UNION ALL
  SELECT 'B', 166 UNION ALL
  SELECT 'B', 423 UNION ALL
  SELECT 'B', 342 UNION ALL
  SELECT 'B', 213 UNION ALL
  SELECT 'C', 520 UNION ALL
  SELECT 'C', 130 UNION ALL
  SELECT 'C', 234 UNION ALL
  SELECT 'C', 512 
)
SELECT type, total FROM (
  SELECT type, ARRAY_AGG(total ORDER BY total DESC LIMIT 3) arr
  FROM `project.dataset.table` GROUP BY type
), UNNEST(arr) total
-- ORDER BY type   

you will get expected result as

Row type    total    
1   A       154  
2   A       123  
3   A       100  
4   B       423  
5   B       342  
6   B       213  
7   C       520  
8   C       512  
9   C       234    

Option 2

But there is yet another interesting option to consider for really big data - to use APPROX_TOP_SUM() function as in below example

#standardSQL
SELECT type, value AS total FROM (
  SELECT type, APPROX_TOP_SUM(total, total, 3) arr
  FROM `project.dataset.table` GROUP BY type
), UNNEST(arr)  

obviously, with the same output as above for sample data

Upvotes: 6

Gordon Linoff
Gordon Linoff

Reputation: 1269503

Row number is fine. A fun way to do this in BigQuery is:

select type,
       array_agg(total order by total desc limit 3) as top3
from t
group by type;

This puts the values into an array.

Upvotes: 8

Fahmi
Fahmi

Reputation: 37473

You can try using row_number()

select * from
(
select type, total, row_number() over(partition by type order by total desc) as rn
from tablename
)A
where rn<=3

Upvotes: 5

Related Questions