Reputation: 101
I have a database containing sales transactions. These are in the following (simplified) format:
sales_id | customer_id | sales_date | number_of_units | total_price
The goal for my query is for each of these transactions, to get the number of sales that this specific customer_id made before the current record, during the whole history of this database, but also during the 365 days before the current record.
Lifetime sales works right now, but the last 365 days part has me stuck. My query right now can identify IF a record had at least one sale in the previous 365 days, and I do it like so:
SELECT sales_id ,customer_id,sales_date,number_of_units,total_price,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY sales_date ASC) as 'LifeTimeSales' ,
CASE WHEN DATEDIFF(DAY,sales_date,LAG(sales_date, 1) OVER (PARTITION BY customer_id ORDER BY sales_date ASC)) > -365
THEN 1 ELSE 0 END as 'Last365Sales'
FROM sales_db
+ some non-important WHERE clauses. After which I aggregate the result of this query in some other ways.
But this does not tell me if this purchase is for example the 4th sale in the previous 365 days of a customer.
Note: This is a query that runs daily on the full database with 6 million records and growing. I drop and recreate this table right now, which is obviously not efficient. Updating the table when new sales come in would be ideal, but right now this is not possible to create. Any ideas?
Some test data:
sales_id,customer_id,sales_date,number_of_units,total_price
1001,2001,2016-01-01,1,86
1002,2001,2016-08-01,3,98
1003,2001,2017-06-01,2,87
1004,2002,2017-06-01,2,15
+ expected result:
sales_id,customer_id,sales_date,number_of_units,total_price,LifeTimeSales,Last365Sales
1001,2001,2016-01-01,1,86,0,0
1002,2001,2016-08-01,3,98,1,1
1003,2001,2017-06-01,2,87,2,1
1004,2002,2017-06-01,2,15,0,0
Upvotes: 0
Views: 287
Reputation: 666
I create report view where all required fields are available. Select all that you need:
with all_history_statistics as
(select customer_id, sales_id, sales_date, number_of_units, total_price,
max(sales_date) over (partition by customer_id order by (select null)) as last_sale_date,
count(sales_id) over (partition by customer_id order by (select null)) total_number_of_sales,
count(sales_id) over (partition by customer_id order by sales_date asc ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) number_of_sales_for_current_date,
sum(number_of_units) over (partition by customer_id order by (select null)) total_number_saled_units,
sum(number_of_units) over (partition by customer_id order by sales_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) number_saled_units_for_current_date,
sum(total_price) over (partition by customer_id order by (select null)) as total_earned,
sum(total_price) over (partition by customer_id order by sales_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) earned_for_current_date)
from sales_db),
with last_year_statistics as
(select customer_id, sales_id, sales_date, number_of_units, total_price,
max(sales_date) over (partition by customer_id order by (select null)) as last_sale_date,
count(sales_id) over (partition by customer_id order by (select null)) total_number_of_sales,
count(sales_id) over (partition by customer_id order by sales_date asc ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) number_of_sales_for_current_date,
sum(number_of_units) over (partition by customer_id order by (select null)) total_number_saled_units,
sum(number_of_units) over (partition by customer_id order by sales_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) number_saled_units_for_current_date,
sum(total_price) over (partition by customer_id order by (select null)) as total_earned,
sum(total_price) over (partition by customer_id order by sales_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) earned_for_current_date)
from sales_db)
select <specify list of fields which you need>
from all_history_statistics t1 inner join last_year_statistics
on t1.customer_id = t2.cutomer_id
;
Upvotes: 0
Reputation: 37472
For the count of sales before a sale you could use correlated subqueries.
SELECT s1.sales_id,
s1.customer_id,
s1.sales_date,
s1.number_of_units,
s1.total_price,
(SELECT count(*)
FROM sales_db s2
WHERE s2.customer_id = s1.customer_id
AND s2.sales_date <= s1.sales_date) - 1 lifetimesales,
(SELECT count(*)
FROM sales_db s2
WHERE s2.customer_id = s1.customer_id
AND s2.sales_date <= s1.sales_date
AND s2.sales_date >= dateadd(day, s1.sales_date, -356)) - 1 last365sales
FROM sales_db s1;
(I used s2.sales_date <= s1.sales_date
and then subtracted 1 from the reuslt, so that multiple sales on the same day, if such data exists, are also counted. But as this also counts the sale of the current row, it has to be decremented by 1.)
Upvotes: 1