zeroes_ones
zeroes_ones

Reputation: 191

SQL: how to average across groups, while taking a time constraint into account

I have a table named orders in a Postgres database that looks like this:

customer_id    order_id    order_date    price    product
1              2           2021-03-05    15       books
1              13          2022-03-07    3        music
1              14          2022-06-15    900      travel
1              11          2021-11-17    25       books
1              16          2022-08-03    32       books
2              4           2021-04-12    4        music
2              7           2021-06-29    9        music
2              20          2022-11-03    8        music
2              22          2022-11-07    575      travel
2              24          2022-11-20    95       food
3              3           2021-03-17    25       books
3              5           2021-06-01    650      travel
3              17          2022-08-17    1200     travel
3              19          2022-10-02    6        music
3              23          2022-11-08    70       food
4              9           2021-08-20    3200     travel
4              10          2021-10-29    2750     travel
4              15          2022-07-15    1820     travel
4              21          2022-11-05    8000     travel
4              25          2022-11-29    27       books
5              1           2021-01-04    3        music
5              6           2021-06-09    820      travel
5              8           2021-07-30    19       books
5              12          2021-12-10    22       music
5              18          2022-09-19    20       books

Here's a SQL Fiddle: http://sqlfiddle.com/#!17/262fc/1

I'd like to return the average money spent by customers per product, but only consider orders within the first 12 months of a given customer's first purchase within the given product group. (yes, this is challenging!)

For example, for customer 1, order ID 2 and order ID 11 would be factored into the average for books(because order ID 11 took place less than 12 months after customer 1's first order for books, which was order ID 2), but order ID 16 would not be factored into the average (because 8/3/22 is more than 12 months from customer 1's first purchase for books, which took place on 3/5/21).

Here is a matrix showing which orders would be included within a given product (denoted by "yes"):

enter image description here

The desired output would look as follows:

         average_spent
books    22.20
music    7.83
travel   1530.71 
food     82.50

How would I do this?

Thanks in advance for any assistance you can give!

Upvotes: 2

Views: 73

Answers (1)

Ajax1234
Ajax1234

Reputation: 71471

You can use a subquery to check whether or not to include a product's price in the summation:

select o.product, sum(o.price)/count(*) val from orders o 
where o.order_date < (select min(o1.order_date) from orders o1 where 
      o1.product = o.product and o.user_id = o1.user_id) + interval '12 months' 
group by o.product

See fiddle

Upvotes: 1

Related Questions