user3209752
user3209752

Reputation: 649

MySQL - How to select 'DISTINCT' overlapping periods (dates or number ranges)

Put succinctly, if a query tells me A overlaps B then I don't need it to also tell me that B also overlaps A as they overlap each other.

So I am trying to use a self join in sql to select just 'DISTINCT' overlaps.

To illustrate, here is a simple SQL fiddle that I wrote to show inclusive overlap selection (http://sqlfiddle.com/#!9/7af84f/1)

In detail...

Assume I have a table of name (char), d1 (int), d2 (int) , the schema of which is below. Here d1 and d2 represent the start and end of some interval that might overlap with another interval in the same table,.

CREATE TABLE test (
  letter char ,
  d1 int ,
  d2 int  
) ;

Given this table I fill it with some values

INSERT INTO test (letter,d1,d2)
VALUES
   ('A',  2, 10),    -- overlaps C and D
   ('B', 12, 20),    -- overlaps E
   ('C',  5, 10),    -- overlaps A and D
   ('D',  1,  8),    -- overlaps A and C 
   ('E', 13, 15),    -- overlaps B
   ('F', 25, 30);    -- doesn't overlap anything

and run the following query that uses a self join to correctly find the rows where d1 and d2 in one row has an inclusive overlap with d1 and d2 in other rows.

-- selects all records that overlap in the range d1 - d2 inclusive
-- (excluding the implicit overlap between a record and itself)
-- The results are sorted by letter followed by d1

SELECT
  basetable.letter as test_letter,
  basetable.d1,
  basetable.d2,
  overlaptable.letter as overlap_letter,
  overlaptable.d1 as overlap_d1,
  overlaptable.d2 as overlap_d2

FROM
  test as basetable, 
  test as overlaptable
WHERE
  -- there is an inclusive overlap
  basetable.d1 <= overlaptable.d2 and basetable.d2 >= overlaptable.d1
AND
  -- the row being checked is not itsself
    basetable.letter <> overlaptable.letter
    AND
    basetable.d1 <> overlaptable.d1
    AND 
    basetable.d2 <> overlaptable.d2
ORDER BY 
  basetable.letter,
  basetable.d1

That correctly gives me the following, showing all 6 versions of overlaps eg left hand column indicates that A overlaps C and another row shows that C overlaps A (note the sqlfiddle doesn't seem to understand field aliases so my column headers are different)

test_letter     d1     d2   overlap_letter  overlap_d1  overlap_d2
  A              2     10         D              1         8
  B             12     20         E             13        15
  C              5     10         D              1         8
  D              1      8         A              2        10
  D              1      8         C              5        10
  E             13     15         B             12        20

My question is this:

How can I alter the sql to just get four rows of 'DISTINCT' or 'one way' overlaps?

ie this result...

test_letter  d1     d2  overlap_letter  overlap_d1  overlap_d2 
    A         2     10        D            1           8
    A         2     10        C            5          10
    B        12     20        E           13          15
    C         5     10        D            1           8

eg:
a result that just shows records for A, B and C in the left hand column according to the following reasoning

Upvotes: 4

Views: 798

Answers (2)

Gordon Linoff
Gordon Linoff

Reputation: 1269793

You can just change to an inequality. And, you should also use JOIN:

SELECT basetable.letter as test_letter, basetable.d1, basetable.d2,
       overlaptable.letter as overlap_letter, overlaptable.d1 as overlap_d1, overlaptable.d2 as overlap_d2
FROM test basetable JOIN
     test overlaptable
     ON basetable.d1 <= overlaptable.d2 AND
        basetable.d2 >= overlaptable.d1
WHERE basetable.letter < overlaptable.letter  -- This is the change
ORDER BY basetable.letter, basetable.d1;

Upvotes: 2

Serg
Serg

Reputation: 22811

This can be as simple as already suggested PK ordering. Alternatively you may wish to introduce lexicographic order of a sort.

CREATE TABLE test (
  letter char ,
  d1 int ,
  d2 int  
) ;

INSERT INTO test (letter,d1,d2)
VALUES
   ('A',  2, 10),    -- overlaps C and D
   ('B', 12, 20),    -- overlaps E
   ('C',  5, 10),    -- overlaps A and D
   ('D',  1,  8),    -- overlaps A and C 
   ('E', 13, 15),    -- overlaps B
   ('F', 25, 30),    -- doesn't overlap anything
   ('G', 50, 60),    -- a set of equal intervals
   ('H', 50, 60),
   ('I', 50, 60)


SELECT
  basetable.letter as test_letter,
  basetable.d1,
  basetable.d2,
  overlaptable.letter as overlap_letter,
  overlaptable.d1 as overlap_d1,
  overlaptable.d2 as overlap_d2

FROM
  test as basetable, 
  test as overlaptable
WHERE
  -- there is an inclusive overlap
  basetable.d1 <= overlaptable.d2 and basetable.d2 >= overlaptable.d1
AND
  -- require lexicographic order: basetable starts later / finishes earlier / its letter is less then overlaptable
  basetable.d1 > overlaptable.d1 OR (basetable.d1 = overlaptable.d1 
                                     AND (basetable.d2 < overlaptable.d2 OR (basetable.d2 = overlaptable.d2 
                                                                             AND basetable.letter < overlaptable.letter)))
ORDER BY 
  overlaptable.d1, 
  basetable.d2,
  basetable.letter

Upvotes: 1

Related Questions