Reputation: 41855
I have the following tables:
customers
customer_id name
----------------
1 bob
2 alice
3 tim
purchases
id customer_id item_bought
--------------------------
1 1 hat
2 1 shoes
3 2 glasses
3 2 glasses
4 2 book
5 3 shoes
6 1 hat
And I want to following result:
customer_name item_bought_most_often
------------------------------------
bob hat
alice glasses
tim shoes
I would do this like this (not actually tried, just the idea):
SELECT customer.name as customer_name,
MODE(item_bought) as item_bought_most_ofen
FROM customers
INNER JOIN purchases USING (customer_id)
GROUP_BY customer_id
However, the MODE aggregation function doesn't exist in Redshift.
It seems that Redshift user defined functions are just regular scalar functions, not aggregations functions. So I don't think I can define it myself.
Any workaround ?
Upvotes: 15
Views: 11968
Reputation: 175556
You could first COUNT
each person purchases and then use RANK()
windowed function:
SELECT name AS customer_name, item_bought AS item_bought_most_often
FROM(SELECT name,item_bought,RANK() OVER(PARTITION BY name ORDER BY cnt DESC) rnk
FROM (SELECT c.name, p.item_bought, COUNT(*) AS cnt
FROM customers c
JOIN purchases p
ON p.customer_id = c.customer_id
GROUP BY c.name, p.item_bought) AS s1) AS s2
WHERE rnk = 1;
Output:
╔═══════════════╦════════════════════════╗
║ customer_name ║ item_bought_most_often ║
╠═══════════════╬════════════════════════╣
║ alice ║ glasses ║
║ bob ║ hat ║
║ tim ║ shoes ║
║ zoe ║ pencil ║
║ zoe ║ book ║
╚═══════════════╩════════════════════════╝
Note:
RANK
will handle multiple most common values.
EDIT:
As Lukas Eder mentioned it could be further simplified:
SELECT name AS customer_name, item_bought AS item_bought_most_often
FROM(SELECT name,item_bought,
RANK() OVER(PARTITION BY name ORDER BY COUNT(*) DESC) rnk
FROM customers c
JOIN purchases p
ON p.customer_id = c.customer_id
GROUP BY c.name, p.item_bought) AS s2
WHERE rnk = 1;
Upvotes: 8
Reputation: 1269445
You can mimic mode()
by using row_number()
:
select name, item_bought
from (select c.name, p.item_bought, count(*) as cnt,
row_number() over (partition by c.name order by count(*) desc) as seqnum
from customers c join
purchases p
using (customer_id)
group by c.name, p.item_bought
) cp
where seqnum = 1;
Upvotes: 10