Olivia Lundy
Olivia Lundy

Reputation: 127

Selecting the last day price by using LAG() function

I have a user transaction table which is having a userid, date, itemid, price and prev_day_price column. Sample is:

userid date itemid price prev_day_price
1 2020-12-26 archicad 1400.0
1 2020-12-26 archicad 1400.0
1 2020-12-24 archicad 1200.0
1 2020-12-23 archicad 1240.0
1 2020-12-23 archicad 1240.0
1 2020-12-21 archicad 1100.0

I need to find the previous (last) day price for each item. I'd like to apply the lag function but also a group by so I can find the previous price by userid and itemid. As my table some days having a more than one rows\price by item and it doesnt goes on 24-12-2020 - 25-12-2020 - 26-12-2020.

userid date itemid price prev_day_price
1 2020-12-26 archicad 1400.0
1 2020-12-26 archicad 1400.0

Now, its starts to confusing and melting my mind to using lag function. Before the update prev_day_price column i'd tried to see preview results by lag function:

SELECT*,
  lag(price) OVER (PARTITION BY itemid, date, userid) AS prev_day_price 
FROM userlog

But result is:

userid date itemid price prev_day_price prev_day_price
1 2020-12-21 archicad 1100.0
1 2020-12-24 archicad 1200.0
1 2020-12-24 archicad 1200.0 1200.0
1 2020-12-24 archicad 1200.0 1200.0
1 2020-12-26 archicad 1400.0
1 2020-12-26 archicad 1400.0 1400.0

It sems my query doesnt work properly. I think so i missed something important. My expected result like:

userid date itemid price prev_day_price
1 2020-12-26 archicad 1400.0 1200
1 2020-12-26 archicad 1400.0 1200
1 2020-12-24 archicad 1200.0 1240
1 2020-12-23 archicad 1240.0 1100
1 2020-12-23 archicad 1240.0 1100
1 2020-12-21 archicad 1100.0 1100

I tried with lag(price,1) instead of lag(price) to offset by 0 and set default to 0 but it does not work either. The output format is not too important as long as I can get it into an array in the end. I'm using Postgres. Thanks in advance.

Upvotes: 2

Views: 1674

Answers (1)

S-Man
S-Man

Reputation: 23726

Why doesn't your solution work properly?

lag(), as well as every other window function, work on the window you are defining. In your case you define a partitioned window, a group, so to speak. The lag() function is executed only within this group, not over the groups. So, it returns the previous values within the partition. E.g. for the 2020-12-26 it return NULL for the first record (as there is no previous record before the first one) and the value for the first record in the second one. But this happens separately within every date group. This explains your result.


Solution for Postgres 11+ :

demo:db<>fiddle

SELECT
    *,
    first_value(price) OVER (
        ORDER BY itemid, userid, mydate 
        GROUPS BETWEEN 1 PRECEDING AND CURRENT ROW
    ) AS prev_day_price 
FROM userlog

You have to define another window, on which you can act. Instead of windowing all dates separately, it is a great idea to bundle all records from the current and the previous date group. This is exactly what the GROUPS window was made for. Withing this group you can order your records by date and take the first value. This is exactly what you expect.


Solution for Postgres versions <11 :

(Neither GROUPS windows nor PRECEDING != UNBOUNDED are supported)

step-by-step demo:db<>fiddle

SELECT
    u.itemid,
    u.mydate,
    u.userid,
    s.price                                          -- 5
FROM userlog u
JOIN (                                               -- 4
    SELECT
        itemid, mydate, userid,
        COALESCE(                                    -- 3
             lag(price) OVER (PARTITION BY itemid, userid ORDER BY mydate),  -- 2
             price
        ) as price
    FROM (
        SELECT
            itemid, mydate, userid,
            MAX(price) as price                      -- 1
        FROM userlog
        GROUP BY itemid, mydate, userid
    ) s
) s USING (itemid, mydate, userid)
  1. Reduce all groups to one single records, e.g. with group and aggregate. Another opportunity could be using SELECT DISTINCT ON (itemid, mydate, userid)
  2. Shift the previous price within each group (well, only the itemid/userid groups, the mydate column must be only used for ordering!) using the lag() function
  3. Because the first record has no previous one, the current will be taken. This is what the COALESCE() function does.
  4. Join this result to your original table and...
  5. ... return the "lagged" price from it.

Upvotes: 4

Related Questions