Reputation: 1
I have a simple table:
ID--- CreateDate --- Value
1 --- 2015-09-25 10:00 --- 1
1 --- 2015-09-25 10:30 --- 2
1 --- 2015-09-25 11:00 --- 3
1 --- 2015-09-25 11:30 --- 2
1 --- 2015-09-25 12:00 --- 1
2 --- 2015-09-25 10:00 --- 2
2 --- 2015-09-25 10:30 --- 3
2 --- 2015-09-25 11:00 --- 3
2 --- 2015-09-25 11:30 --- 3
2 --- 2015-09-25 12:00 --- 2
I need to find sequences in the data where the Value is above 2 for 1½ hours or more within a specific timeframe IE 24 hours. This wouldn't be a problem if I could be sure each measure point (row from each ID) came at exactly 30 minute intervals.
Then, using windowed functions, my result would look like
2 --- 2015-09-25 10:30 --- 3
2 --- 2015-09-25 11:00 --- 3
2 --- 2015-09-25 11:30 --- 3
The problem is - some IDs will generate twice as many rows (and some even more). Like this:
1 --- 2015-09-25 10:00 --- 1
1 --- 2015-09-25 10:30 --- 3
1 --- 2015-09-25 11:00 --- 3
1 --- 2015-09-25 11:30 --- 3
1 --- 2015-09-25 12:00 --- 1
2 --- 2015-09-25 10:00 --- 1
2 --- 2015-09-25 10:15 --- 2
2 --- 2015-09-25 10:30 --- 3
2 --- 2015-09-25 10:45 --- 3
2 --- 2015-09-25 11:00 --- 3
2 --- 2015-09-25 11:15 --- 3
2 --- 2015-09-25 11:30 --- 3
2 --- 2015-09-25 11:45 --- 2
2 --- 2015-09-25 12:00 --- 2
In this case I'd want my result to look like this:
1 --- 2015-09-25 10:30 --- 3
1 --- 2015-09-25 11:00 --- 3
1 --- 2015-09-25 11:30 --- 3
2 --- 2015-09-25 10:30 --- 3
2 --- 2015-09-25 10:45 --- 3
2 --- 2015-09-25 11:00 --- 3
2 --- 2015-09-25 11:15 --- 3
2 --- 2015-09-25 11:30 --- 3
But, to the best of my knowledge, windowed functions doesn't support time/column based arguments. So what are my alternatives when I cannot "count" on a specific amount of rows?
I am open to suggestions on alternatives to using SQL server, table structures, anything :)
Upvotes: 0
Views: 39
Reputation: 1269963
This is a type of gaps-and-islands problem. You need to identify the groups with adjacent sequences with a value greater than 2. One method is to use the difference of row numbers:
select t.*
from (select t.*, max(value) over (partition by id, grp) as maxvalue,
count(*) over (partition by id, grp) as cnt
from (select t.*,
(row_number() over (partition by id order by createdate) -
row_number() over (partition by id, (case when value > 2 then 1 else 0 end)
order by createdate)
) as grp
from table t
) t
) t
where cnt >= 3 and maxvalue > 2
The query then calculates the number in each group and the value, choosing the ones with the larger values and sequences of at least 3 in length. Note: you could get the min and max times and check that the difference is at least 1.5 hours as well. However, a sequence of length three seems to meet your conditions.
Upvotes: 1