Reputation: 41855

MODE aggregation function with redshift

I have the following tables:

customers

customer_id name
----------------
1           bob
2           alice
3           tim

purchases

id customer_id item_bought
--------------------------
1  1           hat
2  1           shoes
3  2           glasses
3  2           glasses
4  2           book
5  3           shoes
6  1           hat

And I want to following result:

customer_name item_bought_most_often
------------------------------------
bob           hat
alice         glasses
tim           shoes

I would do this like this (not actually tried, just the idea):

SELECT customer.name as customer_name,
  MODE(item_bought) as item_bought_most_ofen
FROM customers
INNER JOIN purchases USING (customer_id)
GROUP_BY customer_id

However, the MODE aggregation function doesn't exist in Redshift.

It seems that Redshift user defined functions are just regular scalar functions, not aggregations functions. So I don't think I can define it myself.

Any workaround ?

Upvotes: 15

Answers (2)

Lukasz Szozda

Reputation: 175556

You could first COUNT each person purchases and then use RANK() windowed function:

SELECT name AS customer_name, item_bought AS item_bought_most_often
FROM(SELECT name,item_bought,RANK() OVER(PARTITION BY name ORDER BY cnt DESC) rnk
     FROM (SELECT c.name, p.item_bought, COUNT(*) AS cnt
           FROM customers c
           JOIN purchases p
             ON p.customer_id = c.customer_id
           GROUP BY c.name, p.item_bought) AS s1) AS s2
WHERE rnk = 1;

LiveDemo

Output:

╔═══════════════╦════════════════════════╗
║ customer_name ║ item_bought_most_often ║
╠═══════════════╬════════════════════════╣
║ alice         ║ glasses                ║
║ bob           ║ hat                    ║
║ tim           ║ shoes                  ║
║ zoe           ║ pencil                 ║
║ zoe           ║ book                   ║
╚═══════════════╩════════════════════════╝

Note:

RANK will handle multiple most common values.

EDIT:

As Lukas Eder mentioned it could be further simplified:

SELECT name AS customer_name, item_bought AS  item_bought_most_often
FROM(SELECT name,item_bought,
            RANK() OVER(PARTITION BY name ORDER BY COUNT(*) DESC) rnk
     FROM customers c
     JOIN purchases p
       ON p.customer_id = c.customer_id
     GROUP BY c.name, p.item_bought)  AS s2
WHERE rnk = 1;

db<>fiddle demo

Upvotes: 8

Gordon Linoff

Reputation: 1269445

You can mimic mode() by using row_number():

select name, item_bought
from (select c.name, p.item_bought, count(*) as cnt,
             row_number() over (partition by c.name order by count(*) desc) as seqnum
      from customers c join
           purchases p
           using (customer_id)
      group by c.name, p.item_bought
     ) cp
where seqnum = 1;

Upvotes: 10

MODE aggregation function with redshift

Answers (2)

Related Questions