hemera
hemera

Reputation: 23

Finding the best selling item for each of your best customers in SQL

I have 3 tables: Products (prod_id, prod_name, prod_price), Customers (cust_id, address, email, name), and Sales (sale_id, cust_id, prod_id).

I wrote a query to find the top 100 customers (in terms of money spent):

SELECT TOP 100 C.name AS customer_name,
SUM(P.prod_price) AS total_spent
FROM customers C
LEFT JOIN Sales S ON C.cust_id=S.cust_id
LEFT JOIN Products P ON S.prod_id=P.prod_id
GROUP BY C.cust_id
ORDER BY total_spent DESC;

Now I would like to write a query that will tell me the top seller for each of those 100 customers but I can't figure it out. I tried to write a nested query, but it is not working:

SELECT P.prod_name as best_seller,
C.name AS customer_name
FROM Product P
LEFT JOIN Sales S ON P.prod_id=S.prod_id
LEFT JOIN Customers C on S.cust_id = C.cust_id
GROUP BY C.name, P.prod_name
WHERE C.name IN
    (SELECT C.name AS customer_name
     FROM customers C
     LEFT JOIN Sales S ON C.cust_id=S.cust_id
     LEFT JOIN Products P ON S.prod_id=P.prod_id
     GROUP BY C.cust_id
     ORDER BY total_spent DESC
     LIMIT 100);

I am ultimately going to run these queries on a 10gb dataset using sql, but I have been testing them in SQlite on a really small subset so that I don't have to spin up an instance until I'm pretty sure they work (hence the LIMIT 100 syntax). In SQLite, the only error I get is "near "WHERE": syntax error".

Upvotes: 0

Views: 154

Answers (1)

Gordon Linoff
Gordon Linoff

Reputation: 1269503

I am thinking something more along these lines:

SELECT TOP (100) cp.*
FROM (SELECT c.cust_id, c.name, p.prod_id, p.prod_name, 
             SUM(P.prod_price) AS product_spend,
             ROW_NUMBER() OVER (PARTITION BY c.cust_id ORDER BY COUNT(*) DESC) as seqnum,
             SUM(SUM(p.prod_price)) OVER (PARTITION BY c.cust_id) as total_spent
      FROM customers C JOIN
           Sales S
           ON C.cust_id = S.cust_id JOIN
           Products P
           ON S.prod_id = P.prod_id
      GROUP BY C.cust_id, p.prod_id, p.prod_name
     ) cp
WHERE seqnum = 1
ORDER BY total_spent DESC;

This aggregates once by customer and product. It uses window functions to get the total per customer and then filters and sorts in the outer query.

Use COUNT(*) in the ORDER BY to get "bought the most of". Use SUM(p.prod_price) to get "spent the most on".

Upvotes: 1

Related Questions