SQL: Assembling Non-Overlapping Sets

Question

I have sets of consecutive integers, organized by type, in table1. All values are between 1 and 10, inclusive.

table1:
row_id  set_id  type    min_value   max_value
1       1       a       1           3
2       2       a       4           10
3       3       a       6           10
4       4       a       2           5
5       5       b       1           9
6       6       c       1           7
7       7       c       3           10
8       8       d       1           2
9       9       d       3           3
10      10      d       4           5
11      11      d       7           10

In table2, within each type, I want to assemble all possible maximal, non-overlapping sets (though gaps that cannot be filled by any sets of the correct type are okay). Desired output:

table2:
row_id  type    group_id    set_id
1       a       1           1
2       a       1           2
3       a       2           1
4       a       2           3
5       a       3           3
6       a       3           4
7       b       4           5
8       c       5           6
9       c       6           7
10      d       7           8
11      d       7           9
12      d       7           10
13      d       7           11

My current idea is to use the fact that there is a limited number of possible values. Steps:

Find all sets in table1 containing value 1. Copy them into table2.
Find all sets in table1 containing value 2 and not already in table2.
Join the sets from (2) with table1 on type, set_id, and having min_value greater than the group's greatest max_value.
For the sets from (2) that did not join in (3), insert them into table2. These start new groups that may be extended later.
Repeat steps (2) through (4) for values 3 through 10.

I think this will work, but it has a lot of pain-in-the-butt steps, especially for (2)--finding the sets not in table2, and (4)--finding the sets that did not join.

Do you know a faster, more efficient method? My real data has millions of sets, thousands of types, and hundreds of values (though fortunately, as in the example, the values are bounded), so scalability is essential.

I'm using PLSQL Developer with Oracle 10g (not 11g as I stated before--thanks, IT department). Thanks!

Alex Poole · Accepted Answer

For Oracle 10g you can't use recursive CTEs, but with a bit of work you can do something similar with the connect by syntax. First you need to generate a CTE or in-line view which has all the non-overlapping links, which you can do with:

select t1.type, t1.set_id, t1.min_value, t1.max_value,
  t2.set_id as next_set_id, t2.min_value as next_min_value,
  t2.max_value as next_max_value,
  row_number() over (order by t1.type, t1.set_id, t2.set_id) as group_id
from table1 t1
left join table1 t2 on t2.type = t1.type
and t2.min_value > t1.max_value
where not exists (
  select 1
  from table1 t4
  where t4.type = t1.type
  and t4.min_value > t1.max_value
  and t4.max_value < t2.min_value
)
order by t1.type, group_id, t1.set_id, t2.set_id;

This took a bit of experimentation and it's certainly possible I've missed or lost something about the rules in the process; but that gives you 12 pseudo-rows, and is in my previous answer this allows the two separate chains starting with a/1 to be followed while constraining the d values to a single chain:

TYPE SET_ID  MIN_VALUE  MAX_VALUE NEXT_SET_ID NEXT_MIN_VALUE NEXT_MAX_VALUE GROUP_ID
---- ------ ---------- ---------- ----------- -------------- -------------- --------
a         1          1          3           2              4             10        1 
a         1          1          3           3              6             10        2 
a         2          4         10                                                  3 
a         3          6         10                                                  4 
a         4          2          5           3              6             10        5 
b         5          1          9                                                  6 
c         6          1          7                                                  7 
c         7          3         10                                                  8 
d         8          1          2           9              3              3        9 
d         9          3          3          10              4              5       10 
d        10          4          5          11              7             10       11 
d        11          7         10                                                 12

And that can be used as a CTE; querying that with a connect-by loop:

with t as (
   ... -- same as above query
)
select t1.type,
  dense_rank() over (partition by null
    order by connect_by_root group_id) as group_id,
  t1.set_id
from t t1
connect by type = prior type
and set_id = prior next_set_id
start with not exists (
  select 1 from table1 t2
  where t2.type = t1.type
  and t2.max_value < t1.min_value
)
and not exists (
  select 1 from t t3
  where t3.type = t1.type
  and t3.next_max_value < t1.next_min_value
)
order by t1.type, group_id, t1.min_value;

The dense_rank() makes the group IDs contiguous; not sure if you actually need those at all, or if their sequence matters, so it's optional really. connect_by_root gives the group ID for the start of the chain, so although there were 12 rows and 12 group_id values in the initial query, they don't all appear in the final result.

The connection is via two prior values, type and the next set ID found in the initial query. That creates all the chains, but own its own would also include shorter chains - for d you'd see 8,9,10,11 but also 9,10,11 and 10,11, which you don't want as separate groups. Those are eliminated by the start with conditions, which could maybe be simplified.

That gives:

TYPE GROUP_ID SET_ID
---- -------- ------
a           1      1 
a           1      2 
a           2      1 
a           2      3 
a           3      4 
a           3      3 
b           4      5 
c           5      6 
c           6      7 
d           7      8 
d           7      9 
d           7     10 
d           7     11

SQL Fiddle demo.

SQL: Assembling Non-Overlapping Sets

Answers (2)

Related Questions