Jordan Lewallen
Jordan Lewallen

Reputation: 1851

SQL - LAG equivalent using OVER clause to retrieve previous value and store in current row?

I have a tables of dates where I need to store the previous row's val column in the current row. However, I cannot use LAG clause because there are dates that are missing and pulling the previous value and storing it in the current row should only happen if the immediate date previous exists. Thus I must (I think) use an OVER() and RANGE clause

For example, imagine the following table:

date      | val
-----------------
12/1/2020 | 1
12/2/2020 | 2
12/4/2020 | 3
12/5/2020 | 4

I'd like to be able to add an additional column with the value from the day before, and if the previous day doesn't exist, then store a 0

date      | val | prev_val
--------------------------
12/1/2020 | 1.  | 0
12/2/2020 | 2.  | 1
12/4/2020 | 3.  | 0 <--- notice how 12/3/2020 doesn't exist? that's why we store a 0
12/5/2020 | 4.  | 3

To populate prev_val, I figured I could do something like the following in my SELECT statement:

SUM(val) OVER(ORDER BY UNIX_DATE(date) RANGE BETWEEN 1 PRECEDING AND NOT CURRENT ROW) as prev_val

Obviously NOT does not exist, but that's the concept I'd have to go for right? I don't know the correct syntax to use to just get the single previous value and I cannot use ROW as that would not account for missing dates. What am I missing? I'm using Google BigQuery.

Upvotes: 2

Views: 1634

Answers (3)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 172993

Consider also below less verbose option

select *,
  if(date - 1 = lag(date) over prev_rows, lag(val) over prev_rows, 0) prev_val 
from `project.dataset.table` 
window prev_rows as (order by date)   

If applied to sample data in your question

with `project.dataset.table` as (
  select date '2020-12-01' date, 1 val union all
  select '2020-12-02', 2 union all
  select '2020-12-04', 3 union all
  select '2020-12-05', 4 
)

output is

enter image description here

Upvotes: 0

Gordon Linoff
Gordon Linoff

Reputation: 1269873

You don't need lag() for this. You can use a window frame:

select t.*,
       (val -
        max(val) over (order by unix(date)
                       range between 1 preceding and 1 preceding
                      )
       ) as diff
from t;

This returns NULL if the value doesn't exist. You can use coalesce() to handle that:

select t.*,
       coalesce( (val -
                  max(val) over (order by unix(date)
                                 range between 1 preceding and 1 preceding
                                )
                 ), 0
               ) as diff
from t;

Upvotes: 0

Tim Biegeleisen
Tim Biegeleisen

Reputation: 521289

You could use LAG along with a CASE expression which conditionally renders either the previous value of the immediately preceding day, or zero in the event that this day does not exist:

SELECT
    date,
    val,
    CASE WHEN DATE_SUB(date, INTERVAL 1 DAY) = LAG(date) OVER (ORDER BY date)
         THEN LAG(val) OVER (ORDER BY date)
         ELSE 0 END AS prev_val
FROM yourTable
ORDER BY
    date;

Upvotes: 3

Related Questions