Reputation: 127
I have a user transaction table which is having a userid, date, itemid, price and prev_day_price column. Sample is:
userid | date | itemid | price | prev_day_price |
---|---|---|---|---|
1 | 2020-12-26 | archicad | 1400.0 | |
1 | 2020-12-26 | archicad | 1400.0 | |
1 | 2020-12-24 | archicad | 1200.0 | |
1 | 2020-12-23 | archicad | 1240.0 | |
1 | 2020-12-23 | archicad | 1240.0 | |
1 | 2020-12-21 | archicad | 1100.0 |
I need to find the previous (last) day price for each item. I'd like to apply the lag function but also a group by so I can find the previous price by userid and itemid. As my table some days having a more than one rows\price by item and it doesnt goes on 24-12-2020 - 25-12-2020 - 26-12-2020.
userid | date | itemid | price | prev_day_price |
---|---|---|---|---|
1 | 2020-12-26 | archicad | 1400.0 | |
1 | 2020-12-26 | archicad | 1400.0 |
Now, its starts to confusing and melting my mind to using lag function. Before the update prev_day_price column i'd tried to see preview results by lag function:
SELECT*,
lag(price) OVER (PARTITION BY itemid, date, userid) AS prev_day_price
FROM userlog
But result is:
userid | date | itemid | price | prev_day_price | prev_day_price |
---|---|---|---|---|---|
1 | 2020-12-21 | archicad | 1100.0 | ||
1 | 2020-12-24 | archicad | 1200.0 | ||
1 | 2020-12-24 | archicad | 1200.0 | 1200.0 | |
1 | 2020-12-24 | archicad | 1200.0 | 1200.0 | |
1 | 2020-12-26 | archicad | 1400.0 | ||
1 | 2020-12-26 | archicad | 1400.0 | 1400.0 |
It sems my query doesnt work properly. I think so i missed something important. My expected result like:
userid | date | itemid | price | prev_day_price |
---|---|---|---|---|
1 | 2020-12-26 | archicad | 1400.0 | 1200 |
1 | 2020-12-26 | archicad | 1400.0 | 1200 |
1 | 2020-12-24 | archicad | 1200.0 | 1240 |
1 | 2020-12-23 | archicad | 1240.0 | 1100 |
1 | 2020-12-23 | archicad | 1240.0 | 1100 |
1 | 2020-12-21 | archicad | 1100.0 | 1100 |
I tried with lag(price,1) instead of lag(price) to offset by 0 and set default to 0 but it does not work either. The output format is not too important as long as I can get it into an array in the end. I'm using Postgres. Thanks in advance.
Upvotes: 2
Views: 1674
Reputation: 23726
Why doesn't your solution work properly?
lag()
, as well as every other window function, work on the window you are defining. In your case you define a partitioned window, a group, so to speak. The lag()
function is executed only within this group, not over the groups. So, it returns the previous values within the partition. E.g. for the 2020-12-26
it return NULL
for the first record (as there is no previous record before the first one) and the value for the first record in the second one. But this happens separately within every date
group. This explains your result.
Solution for Postgres 11+ :
SELECT
*,
first_value(price) OVER (
ORDER BY itemid, userid, mydate
GROUPS BETWEEN 1 PRECEDING AND CURRENT ROW
) AS prev_day_price
FROM userlog
You have to define another window, on which you can act. Instead of windowing all dates separately, it is a great idea to bundle all records from the current and the previous date group. This is exactly what the GROUPS
window was made for. Withing this group you can order your records by date and take the first value. This is exactly what you expect.
Solution for Postgres versions <11 :
(Neither GROUPS
windows nor PRECEDING != UNBOUNDED
are supported)
SELECT
u.itemid,
u.mydate,
u.userid,
s.price -- 5
FROM userlog u
JOIN ( -- 4
SELECT
itemid, mydate, userid,
COALESCE( -- 3
lag(price) OVER (PARTITION BY itemid, userid ORDER BY mydate), -- 2
price
) as price
FROM (
SELECT
itemid, mydate, userid,
MAX(price) as price -- 1
FROM userlog
GROUP BY itemid, mydate, userid
) s
) s USING (itemid, mydate, userid)
SELECT DISTINCT ON (itemid, mydate, userid)
price
within each group (well, only the itemid
/userid
groups, the mydate
column must be only used for ordering!) using the lag()
functionCOALESCE()
function does.price
from it.Upvotes: 4