Reputation: 23
I have 3 tables: Products (prod_id, prod_name, prod_price), Customers (cust_id, address, email, name), and Sales (sale_id, cust_id, prod_id).
I wrote a query to find the top 100 customers (in terms of money spent):
SELECT TOP 100 C.name AS customer_name,
SUM(P.prod_price) AS total_spent
FROM customers C
LEFT JOIN Sales S ON C.cust_id=S.cust_id
LEFT JOIN Products P ON S.prod_id=P.prod_id
GROUP BY C.cust_id
ORDER BY total_spent DESC;
Now I would like to write a query that will tell me the top seller for each of those 100 customers but I can't figure it out. I tried to write a nested query, but it is not working:
SELECT P.prod_name as best_seller,
C.name AS customer_name
FROM Product P
LEFT JOIN Sales S ON P.prod_id=S.prod_id
LEFT JOIN Customers C on S.cust_id = C.cust_id
GROUP BY C.name, P.prod_name
WHERE C.name IN
(SELECT C.name AS customer_name
FROM customers C
LEFT JOIN Sales S ON C.cust_id=S.cust_id
LEFT JOIN Products P ON S.prod_id=P.prod_id
GROUP BY C.cust_id
ORDER BY total_spent DESC
LIMIT 100);
I am ultimately going to run these queries on a 10gb dataset using sql, but I have been testing them in SQlite on a really small subset so that I don't have to spin up an instance until I'm pretty sure they work (hence the LIMIT 100 syntax). In SQLite, the only error I get is "near "WHERE": syntax error".
Upvotes: 0
Views: 154
Reputation: 1269503
I am thinking something more along these lines:
SELECT TOP (100) cp.*
FROM (SELECT c.cust_id, c.name, p.prod_id, p.prod_name,
SUM(P.prod_price) AS product_spend,
ROW_NUMBER() OVER (PARTITION BY c.cust_id ORDER BY COUNT(*) DESC) as seqnum,
SUM(SUM(p.prod_price)) OVER (PARTITION BY c.cust_id) as total_spent
FROM customers C JOIN
Sales S
ON C.cust_id = S.cust_id JOIN
Products P
ON S.prod_id = P.prod_id
GROUP BY C.cust_id, p.prod_id, p.prod_name
) cp
WHERE seqnum = 1
ORDER BY total_spent DESC;
This aggregates once by customer and product. It uses window functions to get the total per customer and then filters and sorts in the outer query.
Use COUNT(*)
in the ORDER BY
to get "bought the most of". Use SUM(p.prod_price)
to get "spent the most on".
Upvotes: 1