The Great
The Great

Reputation: 7693

Compare preceding two rows with current and next row using Big Query

I have a data like as shown below

rno id day  val
0   1   1   7
1   1   2   5
2   1   3   10
3   1   4   10
4   1   5   11
5   1   6   11
6   1   7   14
7   1   8   14
20  2   1   5
21  2   2   7
22  2   3   8
23  2   4   8
24  2   5   9
25  2   6   9
26  2   7   13
27  2   8   13
28  2   9   15
29  2   10  15

I would like to create a new column as fake_flag and fill in values as fake_val based on below two rules

rule 1 - for each val (n), check whether the preceding two rows (n-1,n-2) are constant or decreasing (ex: 7,5 or 5,5 is valid whereas 5,7 is not valid because it is increasing and not constant as well) and get the max value as output. if it's 7,5, the output will be 7. if it's 5,5 then the output will be 5

rule 2 - check whether the current value (n) and next value (n+1) is more than max of rule 1 output by 3 or more points (>=3). Ex: if rule 1 output was 5, then we expect to see atleast 8 (n),8(n+1). It could be 9,9 or 10,10

I expect my output data to look like as shown below

rno id day  val fake_flag
0   1   1   7     
1   1   2   5     
2   1   3   10    fake_val  # >= 3 from max of preceding 2 rows and `n` and `n+1` is same 
3   1   4   10     
4   1   5   11
5   1   6   11
6   1   7   14    fake_val  # >= 3 from max of preceding 2 rows and `n` and `n+1` is same 
7   1   8   14
20  2   1   5
21  2   2   7
22  2   3   8
23  2   4   8
24  2   5   9
25  2   6   9
26  2   7   13    fake_val    # >= 3 from max of preceding 2 rows and `n` and `n+1` is same 
27  2   8   13
28  2   9   15
29  2   10  15

Upvotes: 1

Views: 4528

Answers (2)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 172974

Below is for BigQuery Standard SQL

#standardSQL
SELECT rno, id, day, val, 
  IF(IFNULL(val_prev2 > val_prev1, FALSE)                   -- rule 1
    OR ( 
      (val - GREATEST(val_prev2, val_prev1) >= 3)           -- rule 2 for val(n)
      AND (val_next - GREATEST(val_prev2, val_prev1) >= 3)  -- rule 2 for val(n+1)
    ), 
    'fake_val', ''
  ) AS fake_flag
FROM (
  SELECT *,
    LAG(val) OVER(PARTITION BY id ORDER BY day) val_prev1,
    LAG(val, 2) OVER(PARTITION BY id ORDER BY day) val_prev2,
    LEAD(val) OVER(PARTITION BY id ORDER BY day) val_next
  FROM `project.dataset.table`
)

If to apply to sample data from your question - result is

Row rno id  day val fake_flag    
1   0   1   1   7        
2   1   1   2   5        
3   2   1   3   10  fake_val     
4   3   1   4   10       
5   4   1   5   11       
6   5   1   6   11       
7   6   1   7   14  fake_val     
8   7   1   8   14       
9   20  2   1   5        
10  21  2   2   7        
11  22  2   3   8        
12  23  2   4   8        
13  24  2   5   9        
14  25  2   6   9        
15  26  2   7   13  fake_val     
16  27  2   8   13       
17  28  2   9   15       
18  29  2   10  15       

Upvotes: 3

WIT
WIT

Reputation: 1083

This should accomplish what you want. I tested it out with dummy data, but if I didn't understand some part let me know and I can amend.

Select *
, CASE WHEN 
  -- Rule 1
  (LAG(val, 1) over w <= LAG(val, 2) over w)  AND 
  (val = LEAD(val, 1) over w) AND -- n = n + 1, part of rule 2
   -- Can assume row n-2 is the max because it will either be the same as row n-1 or greater than row n-1 for rule 1 to be satisfied
  (LAG(val, 2) over w <= val + 3) -- Only have to check current row val because for first part of rule 2 to be satisfied val for row n must equal val for row n + 1
  THEN 'fake_val' -- I would just have a 1 representing it is true and then 0 if not, but up to you 
  ELSE null 
  END as fake_flag
from Dataset.Table_name
WINDOW w as (ORDER BY rno ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING)

Upvotes: 3

Related Questions