Nivi
Nivi

Reputation: 1137

Query different number of records for each category in SQL

I have a table that looks like the following:

col1 | col2 | col3 | col4 
A    | 1    | 2    | 4
A    | 2    | 5    | 3
A    | 5    | 1    | 6
B    | 3    | 1    | 2
B    | 4    | 4    | 4

I have another table where the records are unique and looks like the following:

col1 | col2
A    | 2
B    | 1

I want to query Table 1 in such a way that I filter out only n number of records for each category in Table 1 based on the value the categories have in Table 2.

Based on Table 2 I need to extract 2 records for A and 1 record for B. I need the resulting queried table to look like the following:

col1 | col2 | col3 | col4
A    | 2    | 5    | 3
A    | 1    | 2    | 4
B    | 3    | 1    | 2

The choice of the records are made based on col4 sorted in ascending order. I am currently tring to do this on BigQuery.

Upvotes: 3

Views: 66

Answers (2)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 173003

Below is for BigQuery Standard SQL

#standardSQL
SELECT t.* 
FROM (
  SELECT ARRAY_AGG(t1 ORDER BY t1.col4) arr, MIN(t2.col2) cnt
  FROM table1 t1 JOIN table2 t2 ON t1.col1 = t2.col1
  GROUP BY t1.col1
), UNNEST(arr) t WITH OFFSET num
WHERE num < cnt   

you can test / play with it using dummy data from your question as below

#standardSQL
WITH `table1` AS (
  SELECT 'A' col1, 1 col2, 2 col3, 4 col4 UNION ALL
  SELECT 'A', 2, 5, 3 UNION ALL
  SELECT 'A', 5, 1, 6 UNION ALL
  SELECT 'B', 3, 1, 2 UNION ALL
  SELECT 'B', 4, 4, 4 
), `table2` AS (
  SELECT 'A' col1, 2 col2 UNION ALL
  SELECT 'B', 1 
)
SELECT t.* 
FROM (
  SELECT ARRAY_AGG(t1 ORDER BY t1.col4) arr, MIN(t2.col2) cnt
  FROM table1 t1 JOIN table2 t2 ON t1.col1 = t2.col1
  GROUP BY t1.col1
), UNNEST(arr) t WITH OFFSET num
WHERE num < cnt

with output as

Row col1    col2    col3    col4     
1   A       2       5       3    
2   A       1       2       4    
3   B       3       1       2    

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1269943

You can use row_number() and join:

select t1.col1, t1.col2, t1.col3, t1.col4
from (select t1.*, row_number() over (partition by col1 order by col4) as seqnum
      from table1 t1
     ) t1 join
     table2 t2
     on t2.col1 = t1.col1 and t1.seqnum <= t2.col2
order by t1.col1, t1.col4;

Upvotes: 2

Related Questions