RFBM
RFBM

Reputation: 1

SQL Server : COUNT of a SUM with varying GROUP BY

I am working on an Azure SQL database.

Say I have a table of order data with two columns customer_id and items_purchased:

| customer_id | items_purchased |
+-------------+-----------------+
| 1           | 1               |
| 2           | 2               |
| 3           | 1               |
| 1           | 4               |
| 2           | 3               |
| 4           | 3               |
| 5           | 2               |
| 6           | 2               |
| 6           | 1               |

I would like to know how many customers have bought a total of 1 item over their lifetime, how many have bought 2 items etc.

So for the above table, my desired result would be:

| total_items_purchased  | number_of_customers |
+------------------------+---------------------+
| 1                      | 1                   |
| 2                      | 1                   |
| 3                      | 2                   |
| 4                      | 0                   |
| 5                      | 2                   |

1 customer has a lifetime total of 1 item, 2 customers have a lifetime total of 3 items etc.

The closest I have got is an output of how many items each customer has bought using the below:

SELECT customer_id, SUM(items_purchased) 
FROM orderdata
GROUP BY customer_id;

But obviously this needs to be further grouped or something to get the result I need.

Can anyone help?

Thanks,

Ross

Upvotes: 0

Views: 68

Answers (1)

Gordon Linoff
Gordon Linoff

Reputation: 1270421

I think you want two levels of aggregation:

select num_items, count(*) as num_customers
from (select customer_id, sum(items_purchased) as num_items
      from t
      group by customer_id
     ) c
group by num_items
order by num_items;

Note: This does not provide counts of 0.

If you want counts of 0, you need to generate the numbers. One method is a recursive CTE:

with i as (
      select num_items, count(*) as num_customers
      from (select customer_id, sum(items_purchased) as num_items
            from t
            group by customer_id
           ) c
      group by num_items
     ),
     n as (
      select 1 as n, max(num_items) as maxn
      from i
      union all
      select n + 1, maxn
      from n
      where n < maxn
    )
select n.n, coalesce(i.num_customers, 0) as num_customers
from n left join
     i
     on n.n = i.num_items
with option (maxrecursion 0);

Upvotes: 2

Related Questions