Reputation: 7693
I have a data like as shown below
rno id day val
0 1 1 7
1 1 2 5
2 1 3 10
3 1 4 10
4 1 5 11
5 1 6 11
6 1 7 14
7 1 8 14
20 2 1 5
21 2 2 7
22 2 3 8
23 2 4 8
24 2 5 9
25 2 6 9
26 2 7 13
27 2 8 13
28 2 9 15
29 2 10 15
I would like to create a new column as fake_flag
and fill in values as fake_val
based on below two rules
rule 1 - for each val (n
), check whether the preceding two rows (n-1
,n-2
) are constant or decreasing (ex: 7,5 or 5,5 is valid whereas 5,7 is not valid because it is increasing and not constant as well) and get the max value as output. if it's 7,5, the output will be 7. if it's 5,5 then the output will be 5
rule 2 - check whether the current value (n
) and next value (n+1
) is more than max of rule 1 output by 3 or more points (>=3). Ex: if rule 1 output was 5, then we expect to see atleast 8 (n
),8(n+1
). It could be 9,9 or 10,10
I expect my output data to look like as shown below
rno id day val fake_flag
0 1 1 7
1 1 2 5
2 1 3 10 fake_val # >= 3 from max of preceding 2 rows and `n` and `n+1` is same
3 1 4 10
4 1 5 11
5 1 6 11
6 1 7 14 fake_val # >= 3 from max of preceding 2 rows and `n` and `n+1` is same
7 1 8 14
20 2 1 5
21 2 2 7
22 2 3 8
23 2 4 8
24 2 5 9
25 2 6 9
26 2 7 13 fake_val # >= 3 from max of preceding 2 rows and `n` and `n+1` is same
27 2 8 13
28 2 9 15
29 2 10 15
Upvotes: 1
Views: 4528
Reputation: 172974
Below is for BigQuery Standard SQL
#standardSQL
SELECT rno, id, day, val,
IF(IFNULL(val_prev2 > val_prev1, FALSE) -- rule 1
OR (
(val - GREATEST(val_prev2, val_prev1) >= 3) -- rule 2 for val(n)
AND (val_next - GREATEST(val_prev2, val_prev1) >= 3) -- rule 2 for val(n+1)
),
'fake_val', ''
) AS fake_flag
FROM (
SELECT *,
LAG(val) OVER(PARTITION BY id ORDER BY day) val_prev1,
LAG(val, 2) OVER(PARTITION BY id ORDER BY day) val_prev2,
LEAD(val) OVER(PARTITION BY id ORDER BY day) val_next
FROM `project.dataset.table`
)
If to apply to sample data from your question - result is
Row rno id day val fake_flag
1 0 1 1 7
2 1 1 2 5
3 2 1 3 10 fake_val
4 3 1 4 10
5 4 1 5 11
6 5 1 6 11
7 6 1 7 14 fake_val
8 7 1 8 14
9 20 2 1 5
10 21 2 2 7
11 22 2 3 8
12 23 2 4 8
13 24 2 5 9
14 25 2 6 9
15 26 2 7 13 fake_val
16 27 2 8 13
17 28 2 9 15
18 29 2 10 15
Upvotes: 3
Reputation: 1083
This should accomplish what you want. I tested it out with dummy data, but if I didn't understand some part let me know and I can amend.
Select *
, CASE WHEN
-- Rule 1
(LAG(val, 1) over w <= LAG(val, 2) over w) AND
(val = LEAD(val, 1) over w) AND -- n = n + 1, part of rule 2
-- Can assume row n-2 is the max because it will either be the same as row n-1 or greater than row n-1 for rule 1 to be satisfied
(LAG(val, 2) over w <= val + 3) -- Only have to check current row val because for first part of rule 2 to be satisfied val for row n must equal val for row n + 1
THEN 'fake_val' -- I would just have a 1 representing it is true and then 0 if not, but up to you
ELSE null
END as fake_flag
from Dataset.Table_name
WINDOW w as (ORDER BY rno ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING)
Upvotes: 3