Hoff
Hoff

Reputation: 39896

postgres: get top n occurrences of a value within each group

I have a simple table like this:

user    letter
--------------
1       A
1       A
1       B
1       B
1       B
1       C

2       A
2       B
2       B
2       C
2       C
2       C

I want to get the top 2 occurrences of 'letter' per user, like so

user    letter  rank(within user group)
--------------------
1       B       1
1       A       2

2       C       1
2       B       2

or even better: collapsed into columns

user    1st-most-occurrence  2nd-most-occurrence
1       B                   A
2       C                   B

How can I accomplish this in postgres?

Upvotes: 3

Views: 2783

Answers (3)

Frederic Leitenberger
Frederic Leitenberger

Reputation: 2019

Needed function:

CREATE OR REPLACE FUNCTION sortCountLimitOffset(anyarray, int, int)
  RETURNS anyarray AS 'select array_agg(x) from (select x from (select unnest($1) as x) as t group by x order by count(*) desc offset $2 limit $3) t;'
  LANGUAGE sql VOLATILE
  COST 100;

Solution 1: (returning all letters concatenated as string)

select
    usr,
    array_to_string(sortCountLimitOffset(array_agg(letter), 0, 5), ',')
from ttt
group by usr;

Output:

 usr | array_to_string
-----+-----------------
   1 | B,A,C
   2 | C,B,A
(2 Zeilen)

Solution 2: (returning each n-th letter in separate column)

select
    usr,
    array_to_string(sortCountLimitOffset(array_agg(letter), 0, 1), ',') letter1,
    array_to_string(sortCountLimitOffset(array_agg(letter), 1, 1), ',') letter2,
    array_to_string(sortCountLimitOffset(array_agg(letter), 2, 1), ',') letter3,
    array_to_string(sortCountLimitOffset(array_agg(letter), 3, 1), ',') letter4,
    array_to_string(sortCountLimitOffset(array_agg(letter), 4, 1), ',') letter5
from ttt
group by usr;

Output:

 usr | letter1 | letter2 | letter3 | letter4 | letter5
-----+---------+---------+---------+---------+---------
   1 | B       | A       | C       |         |
   2 | C       | B       | A       |         |
(2 Zeilen)

It would also be possible to inline the SELECT from the function where the function is called. But the way it is now, it is easier to reuse and maintain the code.

Upvotes: 0

roman
roman

Reputation: 117561

with cte as (
    select 
        t.user_id, t.letter,
        row_number() over(partition by t.user_id order by count(*) desc) as row_num
    from Table1 as t
    group by t.user_id, t.letter
)
select
    c.user_id,
    max(case when c.row_num = 1 then c.letter end) as "1st-most-occurance",
    max(case when c.row_num = 2 then c.letter end) as "2st-most-occurance"
from cte as c
where c.row_num <= 2
group by c.user_id

=> sql fiddle demo

Upvotes: 1

user330315
user330315

Reputation:

Something like this:

select *
from (
    select userid, 
           letter, 
           dense_rank() over (partition by userid order by count(*) desc) as rnk
    from letters
    group by userid, letter
) t
where rnk <= 2
order by userid, rnk;

Note that I replaced user with userid because using reserved words for columns is a bad habit.

Here is an SQLFiddle: http://sqlfiddle.com/#!12/ec3ec/1

Upvotes: 2

Related Questions