python_interest
python_interest

Reputation: 874

How to count threshold number of occurences per group

Sorry for the title :) and I couldnt find proper suitable title for this question.

I have a table where I wanted to count the occurences only upto certain amount of occurences in SQL big query. For example

A     B     C
abc   ab    80
abc   ab    80
abc   ab    80
bcd   ah    90

Solution expected grouping by A If I give group by it counts all the occurences of column A (abc)

A     B     C    count
abc   ab    80   2     *     
abc   ab    80   1     *
bcd   ah    90   1 

where * indicates threshold and splitting of row values

I wanted the threshold to have count 2. Hence we can restrict it

Upvotes: 0

Views: 246

Answers (2)

Tim Biegeleisen
Tim Biegeleisen

Reputation: 521437

We can try creating a partition on A, B, C, and the row number minus one divided by two:

WITH cte AS (
    SELECT A, B, C,
        (ROW_NUMBER() OVER (PARTITION BY A, B, C ORDER BY A, B, C) - 1) / t2.th AS rn
    FROM yourTable t1
    INNER JOIN threshholds t2
        ON t1.A = t2.A
)

SELECT A, B, C, COUNT(*) AS count
FROM cte
GROUP BY A, B, C, rn
ORDER BY A, B, C, COUNT(*) DESC;

This assumes that there is a second table threshholds which persists the different threshholds for each group.

Upvotes: 2

Mikhail Berlyant
Mikhail Berlyant

Reputation: 173003

Below is for BigQuery Standard SQL

#standardSQL
SELECT * EXCEPT(grp), COUNT(1) AS count   
FROM (
  SELECT a, b, c, DIV(ROW_NUMBER() OVER(PARTITION BY a, b, c) - 1, 2) grp
  FROM `project.dataset.table`
) t
GROUP BY a, b, c, grp   

if to apply to sample data from your question - result will be

Row a   b   c   count    
1   abc ab  80  1    
2   abc ab  80  2    
3   bcd ah  90  1      

How can this be solved when the threshold is different each group. Say for abc it is 2 and may be for bcd it may be 3 and so on

#standardSQL
WITH thresholds AS (
  SELECT 'abc' a, 2 threshold UNION ALL
  SELECT 'bcd', 3
)
SELECT a, b, c, COUNT(1) AS count   
FROM (
  SELECT a, b, c, 
    DIV(ROW_NUMBER() OVER(PARTITION BY a, b, c) - 1, threshold) grp
  FROM `project.dataset.table` x
  JOIN thresholds y USING(a)
) t
GROUP BY a, b, c, grp

Upvotes: 1

Related Questions