Reputation: 1851
I have a tables of dates where I need to store the previous row's val
column in the current row. However, I cannot use LAG
clause because there are dates that are missing and pulling the previous value and storing it in the current row should only happen if the immediate date previous exists. Thus I must (I think) use an OVER()
and RANGE
clause
For example, imagine the following table:
date | val
-----------------
12/1/2020 | 1
12/2/2020 | 2
12/4/2020 | 3
12/5/2020 | 4
I'd like to be able to add an additional column with the value from the day before, and if the previous day doesn't exist, then store a 0
date | val | prev_val
--------------------------
12/1/2020 | 1. | 0
12/2/2020 | 2. | 1
12/4/2020 | 3. | 0 <--- notice how 12/3/2020 doesn't exist? that's why we store a 0
12/5/2020 | 4. | 3
To populate prev_val
, I figured I could do something like the following in my SELECT
statement:
SUM(val) OVER(ORDER BY UNIX_DATE(date) RANGE BETWEEN 1 PRECEDING AND NOT CURRENT ROW) as prev_val
Obviously NOT
does not exist, but that's the concept I'd have to go for right? I don't know the correct syntax to use to just get the single previous value and I cannot use ROW
as that would not account for missing dates. What am I missing? I'm using Google BigQuery.
Upvotes: 2
Views: 1634
Reputation: 172993
Consider also below less verbose option
select *,
if(date - 1 = lag(date) over prev_rows, lag(val) over prev_rows, 0) prev_val
from `project.dataset.table`
window prev_rows as (order by date)
If applied to sample data in your question
with `project.dataset.table` as (
select date '2020-12-01' date, 1 val union all
select '2020-12-02', 2 union all
select '2020-12-04', 3 union all
select '2020-12-05', 4
)
output is
Upvotes: 0
Reputation: 1269873
You don't need lag()
for this. You can use a window frame:
select t.*,
(val -
max(val) over (order by unix(date)
range between 1 preceding and 1 preceding
)
) as diff
from t;
This returns NULL
if the value doesn't exist. You can use coalesce()
to handle that:
select t.*,
coalesce( (val -
max(val) over (order by unix(date)
range between 1 preceding and 1 preceding
)
), 0
) as diff
from t;
Upvotes: 0
Reputation: 521289
You could use LAG
along with a CASE
expression which conditionally renders either the previous value of the immediately preceding day, or zero in the event that this day does not exist:
SELECT
date,
val,
CASE WHEN DATE_SUB(date, INTERVAL 1 DAY) = LAG(date) OVER (ORDER BY date)
THEN LAG(val) OVER (ORDER BY date)
ELSE 0 END AS prev_val
FROM yourTable
ORDER BY
date;
Upvotes: 3