Reputation: 874
Sorry for the title :) and I couldnt find proper suitable title for this question.
I have a table where I wanted to count the occurences only upto certain amount of occurences in SQL big query. For example
A B C
abc ab 80
abc ab 80
abc ab 80
bcd ah 90
Solution expected grouping by A If I give group by it counts all the occurences of column A (abc)
A B C count
abc ab 80 2 *
abc ab 80 1 *
bcd ah 90 1
where * indicates threshold and splitting of row values
I wanted the threshold to have count 2. Hence we can restrict it
Upvotes: 0
Views: 246
Reputation: 521437
We can try creating a partition on A
, B
, C
, and the row number minus one divided by two:
WITH cte AS (
SELECT A, B, C,
(ROW_NUMBER() OVER (PARTITION BY A, B, C ORDER BY A, B, C) - 1) / t2.th AS rn
FROM yourTable t1
INNER JOIN threshholds t2
ON t1.A = t2.A
)
SELECT A, B, C, COUNT(*) AS count
FROM cte
GROUP BY A, B, C, rn
ORDER BY A, B, C, COUNT(*) DESC;
This assumes that there is a second table threshholds
which persists the different threshholds for each group.
Upvotes: 2
Reputation: 173003
Below is for BigQuery Standard SQL
#standardSQL
SELECT * EXCEPT(grp), COUNT(1) AS count
FROM (
SELECT a, b, c, DIV(ROW_NUMBER() OVER(PARTITION BY a, b, c) - 1, 2) grp
FROM `project.dataset.table`
) t
GROUP BY a, b, c, grp
if to apply to sample data from your question - result will be
Row a b c count
1 abc ab 80 1
2 abc ab 80 2
3 bcd ah 90 1
How can this be solved when the threshold is different each group. Say for abc it is 2 and may be for bcd it may be 3 and so on
#standardSQL
WITH thresholds AS (
SELECT 'abc' a, 2 threshold UNION ALL
SELECT 'bcd', 3
)
SELECT a, b, c, COUNT(1) AS count
FROM (
SELECT a, b, c,
DIV(ROW_NUMBER() OVER(PARTITION BY a, b, c) - 1, threshold) grp
FROM `project.dataset.table` x
JOIN thresholds y USING(a)
) t
GROUP BY a, b, c, grp
Upvotes: 1