user5375395
user5375395

Reputation: 1

Values for previous X rows based on time between rows

I have a simple table:

ID--- CreateDate --- Value 

1 --- 2015-09-25 10:00 --- 1 
1 --- 2015-09-25 10:30 --- 2 
1 --- 2015-09-25 11:00 --- 3 
1 --- 2015-09-25 11:30 --- 2 
1 --- 2015-09-25 12:00 --- 1 
2 --- 2015-09-25 10:00 --- 2 
2 --- 2015-09-25 10:30 --- 3 
2 --- 2015-09-25 11:00 --- 3 
2 --- 2015-09-25 11:30 --- 3 
2 --- 2015-09-25 12:00 --- 2 

I need to find sequences in the data where the Value is above 2 for 1½ hours or more within a specific timeframe IE 24 hours. This wouldn't be a problem if I could be sure each measure point (row from each ID) came at exactly 30 minute intervals.

Then, using windowed functions, my result would look like

2 --- 2015-09-25 10:30 --- 3 
2 --- 2015-09-25 11:00 --- 3 
2 --- 2015-09-25 11:30 --- 3 

The problem is - some IDs will generate twice as many rows (and some even more). Like this:

1 --- 2015-09-25 10:00 --- 1 
1 --- 2015-09-25 10:30 --- 3 
1 --- 2015-09-25 11:00 --- 3 
1 --- 2015-09-25 11:30 --- 3 
1 --- 2015-09-25 12:00 --- 1 
2 --- 2015-09-25 10:00 --- 1 
2 --- 2015-09-25 10:15 --- 2 
2 --- 2015-09-25 10:30 --- 3 
2 --- 2015-09-25 10:45 --- 3 
2 --- 2015-09-25 11:00 --- 3 
2 --- 2015-09-25 11:15 --- 3 
2 --- 2015-09-25 11:30 --- 3 
2 --- 2015-09-25 11:45 --- 2 
2 --- 2015-09-25 12:00 --- 2

In this case I'd want my result to look like this:

1 --- 2015-09-25 10:30 --- 3 
1 --- 2015-09-25 11:00 --- 3 
1 --- 2015-09-25 11:30 --- 3 
2 --- 2015-09-25 10:30 --- 3 
2 --- 2015-09-25 10:45 --- 3 
2 --- 2015-09-25 11:00 --- 3 
2 --- 2015-09-25 11:15 --- 3 
2 --- 2015-09-25 11:30 --- 3 

But, to the best of my knowledge, windowed functions doesn't support time/column based arguments. So what are my alternatives when I cannot "count" on a specific amount of rows?

I am open to suggestions on alternatives to using SQL server, table structures, anything :)

Upvotes: 0

Views: 39

Answers (1)

Gordon Linoff
Gordon Linoff

Reputation: 1269963

This is a type of gaps-and-islands problem. You need to identify the groups with adjacent sequences with a value greater than 2. One method is to use the difference of row numbers:

select t.*
from (select t.*, max(value) over (partition by id, grp) as maxvalue,
             count(*) over (partition by id, grp) as cnt
      from (select t.*, 
                   (row_number() over (partition by id order by createdate) -
                    row_number() over (partition by id, (case when value > 2 then 1 else 0 end)
                                       order by createdate)
                   ) as grp
            from table t
           ) t
     ) t
where cnt >= 3 and maxvalue > 2

The query then calculates the number in each group and the value, choosing the ones with the larger values and sequences of at least 3 in length. Note: you could get the min and max times and check that the difference is at least 1.5 hours as well. However, a sequence of length three seems to meet your conditions.

Upvotes: 1

Related Questions