Reputation: 10203
I've got a table in a Redshift database that contains intervals with lower bounds l
and upper bounds u
, like so:
| interval_id | l | u |
| ----------- | -- | -- |
| 1 | 1 | 10 |
| 2 | 2 | 5 |
| 3 | 5 | 15 |
| 4 | 26 | 30 |
| 5 | 28 | 35 |
| 6 | 30 | 31 |
| 7 | 44 | 45 |
| 8 | 56 | 58 |
I would like to partition these intervals such that the sets are at least 10 apart. The desired result for the example above would be
| interval_id | l | u | group |
| ----------- | -- | -- | ----- |
| 1 | 1 | 10 | A |
| 2 | 2 | 5 | A |
| 3 | 5 | 15 | A |
| 4 | 26 | 30 | B |
| 5 | 28 | 35 | B |
| 6 | 30 | 31 | B |
| 7 | 44 | 45 | B |
| 8 | 56 | 58 | C |
where e.g. the first three rows would belong to group a
because these intervals either overlap or outright contain each other or are less than 10 apart and row 4 belongs to group b
because its lower bound is 11
away from the upper bound of the union (well, technically the convex hull) of intervals 1-3.
Upvotes: 0
Views: 64
Reputation: 1269503
Hmmm. I think you can take the max of the upper bound for all preceding rows to see where a break occurs. Then, the rest is just a cumulative sum:
select u.*,
sum(case when l < prev_max_u + 10 then 0 else 1 end) over (order by l) as groupid
from (select t.*,
max(u) over (order by l rows between unbounded preceding and 1 preceding) as prev_max_u
from t
) u;
I would normally approach this using a correlated subquery, but I think this formulation will work. I might not work if you have two rows that are exact duplicates.
Upvotes: 1