Reputation: 4565
I have a table that shows purchases of customers like this:
day,transaction_id,customer_id,price
2020-01-01,1,100,10
2020-01-02,2,200,20
2020-01-03,3,100,30
2020-01-04,4,200,40
2020-02-01,5,100,50
2020-02-02,6,200,60
2020-02-03,7,100,70
2020-02-04,8,200,80
I want a query that shows the sum of all transactions to date per user, but only for the transactions in February 2020.
If I do something like the following, then I believe only transactions in February are evaluated in the window function:
SELECT
transaction_id,
SUM(price) OVER (PARTITION BY customer_id ORDER BY day) AS total
FROM
sales
WHERE
day BETWEEN DATE '2021-02-01' AND DATE '2021-02-28'
How can I compute this window function only for rows in February, but include all previous rows within the window function calculation?
This is just sample data, but in the actual data, there are a huge amount of sales, so there is a lot of redundant calculations of the window function for all rows outside of the specified timeframe.
EDIT:
I think my question was misunderstood as both answers so far reflect the opposite of what I want.
I want to return only trips in February, but the window function should make its calculation on ALL trips in January and February.
The result should look like this:
transaction_id,total
5,90
6,120
7,160
8,200
Upvotes: 0
Views: 536
Reputation: 1
I think I got your question. I used T-SQL and created temp table and subquery
declare @from as date ='2020-02-01'
declare @to as date='2020-02-28'
select * from sales
select price,
(SELECT SUM(b.price)
FROM sales b
WHERE b.customer_id = a.customer_id and b.transaction_id <= a.transaction_id) as runningTotal,
transaction_id,
customer_id,
day
INTO #Temp from sales a
select * from #Temp where [day] between @from and @to order by customer_id, transaction_id
drop table #temp
Upvotes: 0
Reputation: 1271231
I want a query that shows the sum of all transactions to date per user, but only for the transactions in February 2020.
If I understand correctly, you only want to show the transactions in February but you want the sum to be all transactions. If so, then use a subquery:
select s.*
from (select s.*,
sum(price) over (partition by customer_id order by day) as running_price
from sales s
) s
where day >= date '2021-02-01' and date < date '2021-03-01';
Note the date comparison using the first of the months and inequalities. This will work for both 2020 (a leap year) and 2021 (not a leap year) -- and ironically both years are mentioned in your question.
Upvotes: 1
Reputation: 522817
Use a conditional summation:
SELECT
transaction_id,
SUM(CASE WHEN day BETWEEN DATE '2021-02-01' AND DATE '2021-02-28'
THEN price ELSE 0 END) OVER (PARTITION BY customer_id ORDER BY day) AS total
FROM sales
Upvotes: 0