Mad Scientist
Mad Scientist

Reputation: 18550

Only return top n results for each group in GROUPING SETS query

I have a rather complicated query performing some aggregations using GROUPING SETS, it looks roughly like the following:

SELECT 
    column1,
    [... more columns here]
    count(*)
FROM table_a 
GROUP BY GROUPING SETS (
    column1,
    [... more columns here]
)
ORDER BY count DESC

This works very well in general, as long as the number of results for each group is reasonably small. But I have some columns in this query that can have a large number of distinct values, which results in a large amount of rows returned by this query.

I'm actually only interested in the top results for each group in the grouping set. But there doesn't seem to be an obvious way to limit the number of results per group in a query using grouping sets, LIMIT doesn't work in this case.

I'm using PostgreSQL 9.6, so I'm not restricted in which newer features I can use here.

So what my query does is something like this:

| column1 | column2 | count |
|---------|---------|-------|
| DE      |         | 32455 |
| US      |         | 3445  |
| FR      |         | 556   |
| GB      |         | 456   |
| RU      |         | 76    |
|         | 12      | 10234 |
|         | 64      | 9805  |
|         | 2       | 6043  |
|         | 98      | 2356  |
|         | 65      | 1023  |
|         | 34      | 501   |

What I actually want is something that only returns the top 3 results:

| column1 | column2 | count |
|---------|---------|-------|
| DE      |         | 32455 |
| US      |         | 3445  |
| FR      |         | 556   |
|         | 12      | 10234 |
|         | 64      | 9805  |
|         | 2       | 6043  |

Upvotes: 4

Views: 929

Answers (2)

Clodoaldo Neto
Clodoaldo Neto

Reputation: 125504

Use row_number and grouping

select a, b, total
from (
    select 
        a, b, total, 
        row_number() over(
            partition by g 
            order by total desc
        ) as rn
    from (
        select a, b, count(*) as total, grouping ((a),(b)) as g
        from t
        group by grouping sets ((a),(b))
    ) s
) s
where rn <= 3

Upvotes: 3

Oto Shavadze
Oto Shavadze

Reputation: 42863

Something like this:

WITH T(column1 , column2, cnt) AS
(
SELECT 'kla', 'k', 10
UNION ALL
SELECT 'kle', 'm', 30
UNION ALL
SELECT 'foo', 'k', 10
UNION ALL
SELECT 'bar', 'm', 30
UNION ALL
SELECT 'bar', 'k', 20
UNION ALL
SELECT 'foo', 'm', 15
UNION ALL
SELECT 'foo', 'p', 10
),
tt AS (select  column1, column2, COUNT(*) AS cnt from t GROUP BY GROUPING SETS( (column1), (column2))  )

(SELECT column1, NULL as column2, cnt FROM tt WHERE column1 IS NOT NULL ORDER BY cnt desc LIMIT 3)
UNION ALL
(SELECT NULL as column1, column2, cnt FROM tt WHERE column2 IS NOT NULL ORDER BY cnt desc LIMIT 3)

Upvotes: 0

Related Questions