Reputation: 1210
I have the sample data below and I am looking to populate the value in the expectedOutput column. I want it to be calculated according to the rule -- if isValid = 1, give me the fruit value the last time isValid = 1. So in this example, row 1 is null because that is the first value. Row 6 shows apple because this row is valid and apple was the fruit the last time there was a valid value.
row fruit isValid expectedOutput (Prior valid value)
1 apple 1 NULL
2 apple 0 NULL
3 apple 0 NULL
4 apple 0 NULL
5 orange 0 NULL
6 orange 1 apple
7 grape 1 orange
8 grape 0 NULL
Here's what I've tried, doesn't quite get me the right result.
LAST_VALUE(case when isValid = 1 then fruit end IGNORE NULLS) OVER(PARTITION BY all ORDER BY row)
Upvotes: 2
Views: 2397
Reputation: 32700
You need to add bounds to your window to exclude the current row :
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
Try this:
WITH
table1 AS (
SELECT 1 as row, 'apple' as fruit, 1 as isValid union all
SELECT 2 as row, 'apple' as fruit, 0 as isValid union all
SELECT 3 as row, 'apple' as fruit, 0 as isValid union all
SELECT 4 as row, 'apple' as fruit, 0 as isValid union all
SELECT 5 as row, 'orange' as fruit, 0 as isValid union all
SELECT 6 as row, 'orange' as fruit, 1 as isValid union all
SELECT 7 as row, 'grape' as fruit, 1 as isValid union all
SELECT 8 as row, 'grape' as fruit, 0 as isValid
)
SELECT
*,
CASE
WHEN isValid = 1 THEN LAST_VALUE(IF(isValid=1, fruit, NULL) IGNORE NULLS) OVER(ORDER BY row ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING)
END AS expectedOutput
FROM
table1
Upvotes: 3
Reputation: 173171
See example below
#standardSQL
with `project.dataset.table` as (
select 1 row , 'apple' fruit, 1 isValid union all
select 2, 'apple', 0 union all
select 3, 'apple', 0 union all
select 4, 'apple', 0 union all
select 5, 'orange', 0 union all
select 6, 'orange', 1 union all
select 7, 'grape', 1 union all
select 8, 'grape', 0
)
select *,
if(isValid = 1, lag(fruit) over(partition by isValid order by row), null) as Prior_valid_value
from `project.dataset.table`
# order by row
with output
Upvotes: 1