drworm
drworm

Reputation: 13

SQL How to select customers with highest transaction amount by state

I am trying to write a SQL query that returns the name and purchase amount of the five customers in each state who have spent the most money.

Table schemas

customers
|_state
|_customer_id
|_customer_name

transactions
|_customer_id
|_transact_amt

Attempts look something like this

SELECT state, Sum(transact_amt) AS HighestSum
FROM (
    SELECT name, transactions.transact_amt, SUM(transactions.transact_amt) AS HighestSum
    FROM customers
    INNER JOIN customers ON transactions.customer_id = customers.customer_id
    GROUP BY state
) Q
GROUP BY transact_amt
ORDER BY HighestSum 

I'm lost. Thank you.

Expected results are the names of customers with the top 5 highest transactions in each state.

ERROR:  table name "customers" specified more than once
SQL state: 42712

Upvotes: 0

Views: 12734

Answers (3)

Erwin Brandstetter
Erwin Brandstetter

Reputation: 657052

There are two valid answers already. Here's a third:

SELECT *
FROM  (
   SELECT c.state, c.customer_name, t.*
        , row_number() OVER (PARTITION BY c.state ORDER BY t.transact_sum DESC NULLS LAST, customer_id) AS rn
   FROM  (
      SELECT customer_id, sum(transact_amt) AS transact_sum
      FROM   transactions
      GROUP  BY customer_id
      ) t
   JOIN   customers c USING (customer_id)
   ) sub
WHERE  rn < 6
ORDER  BY state, rn;

Major points

  • When aggregating all or most rows of a big table, it's typically substantially faster to aggregate before the join. Assuming referential integrity (FK constraints), we won't be aggregating rows that would be filtered otherwise. This might change from nice-to-have to a pure necessity when joining to more aggregated tables. Related:

  • Add additional ORDER BY item(s) in the window function to define which rows to pick from ties. In my example, it's simply customer_id. If you have no tiebreaker, results are arbitrary in case of a tie, which may be OK. But every other execution might return different results, which typically is a problem. Or you include all ties in the result. Then we are back to rank() instead of row_number(). See:

  • While transact_amt can be NULL (has not been ruled out) any sum may end up to be NULL as well. With an an unsuspecting ORDER BY t.transact_sum DESC those customers come out on top as NULL comes first in descending order. Use DESC NULLS LAST to avoid this pitfall. (Or define the column transact_amt as NOT NULL.)

Upvotes: 1

The Impaler
The Impaler

Reputation: 48810

You can get them using aggregation and then by using the RANK() window function. For example:

select
  state,
  rk,
  customer_name
from (
  select
    *,
    rank() over(partition by state order by total desc) as rk
  from (
    select
      c.customer_id,
      c.customer_name,
      c.state,
      sum(t.transact_amt) as total
    from customers c
    join transactions t on t.customer_id = c.customer_id
    group by c.customer_id
  ) x
) y
where rk <= 5
order by state, rk

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1269993

First, you need for your JOIN to be correct. Second, you want to use window functions:

SELECT ct.*
FROM (SELECT c.customer_id, c.name, c.state, SUM(t.transact_amt) AS total,
             ROW_NUMBER() OVER (PARTITION BY c.state ORDER BY SUM(t.transact_amt) DESC) as seqnum
      FROM customers c JOIN
           transaactions t
           ON t.customer_id = c.customer_id
      GROUP BY c.customer_id, c.name, c.state
     ) ct
WHERE seqnum <= 5;

You seem to have several issues with SQL. I would start with understanding aggregation functions. You have a SUM() with the alias HighestSum. It is simply the total per customer.

Upvotes: 1

Related Questions