Barayjeemqaf
Barayjeemqaf

Reputation: 75

Getting the Min(startdate) and Max(enddate) for an ID when that ID shows up multiple times

I have a table with a column for ID, StartDate, EndDate, And whether or not there was a gap between the enddate of that row and the next start date. If there was only one set instance of that ID i know that I could just do

SELECT min(startdate),max(enddate)
FROM table 
GROUP BY ID

However, I have multiple instances of these IDs in several non-connected timespans. So if I were to do that I would get the very first start date and the last enddate for a different set of time for that personID. How would I go about making sure I get the min a max dates for the specific blocks of time?

I thought about potentially creating a new column where it would have a number for each set of time. So for the first set of time that has no gaps, it would have 1, then when the next row has a gap it will add +1 corresponding to a new set of time. but I am not really sure how to go about that. Here is some sample data to illustrate what I am working with:

ID    StartDate    EndDate    NextDate    Gap_ind
001   1/1/2018    1/31/2018   2/1/2018       N
001   2/1/2018    2/30/2018   3/1/2018       N
001   3/1/2018    3/31/2018   5/1/2018       Y
001   5/1/2018    5/31/2018   6/1/2018       N
001   6/1/2018    6/30/2018   6/30/2018      N

Upvotes: 1

Views: 1094

Answers (2)

Eric Brandt
Eric Brandt

Reputation: 8101

This is a classic "gaps and islands" problem, where you are trying to define the boundaries of your islands, and which you can solve by using some windowing functions.

Your initial effort is on track. Rather than getting the next start date, though, I used the previous end date to calculate the groupings.

The innermost subquery below gets the previous end date for each of your date ranges, and also assigns a row number that we use later to keep our groupings in order.

The next subquery out uses the previous end date to identify which groups of date ranges go together (overlap, or nearly so).

The outermost query is the end result you're looking for.

SELECT
  Grp.ID,
  MIN(Grp.StartDate) AS GroupingStartDate,
  MAX(Grp.EndDate) AS GroupingEndDate
FROM
  (
    SELECT
      PrevDt.ID,
      PrevDt.StartDate,
      PrevDt.EndDate,
      SUM(CASE WHEN DATEADD(DAY,1,PrevDt.PreviousEndDate) >= PrevDt.StartDate THEN 0 ELSE 1 END) 
         OVER (PARTITION BY PrevDt.ID ORDER BY PrevDt.RN) AS GrpNum
    FROM
      (
        SELECT
          ROW_NUMBER() OVER (PARTITION BY ID ORDER BY StartDate, EndDate) as RN,
          ID,
          StartDate,
          EndDate,
          LAG(EndDate,1) OVER (PARTITION BY ID ORDER BY StartDate) AS PreviousEndDate 
        FROM
          tbl
       ) AS PrevDt
  ) AS Grp
GROUP BY 
  Grp.ID,
  Grp.GrpNum;

Results:

+-----+------------------+--------------+
| ID  | InitialStartDate | FinalEndDate |
+-----+------------------+--------------+
| 001 | 2018-01-01       | 2018-03-01   |
| 001 | 2018-05-01       | 2018-06-01   |
+-----+------------------+--------------+

SQL Fiddle demo.

Further reading:

The SQL of Gaps and Islands in Sequences

Gaps and Islands Across Date Ranges

Upvotes: 2

Gordon Linoff
Gordon Linoff

Reputation: 1269953

This is an example of a gaps-and-islands problem. A simple solution is to use lag() to determine if there are overlaps. When there is none, you have the start of a group. A cumulative sum defines the group -- and you aggregate on that.

select t.id, min(startdate), max(enddate)
from (select t.*,
             sum(case when prev_enddate >= dateadd(day, -1, startdate)
                      then 0 else 1
                 end) over (partition by id order by startdate) as grp
      from (select t.*, lag(enddate) over (partition by id order by startdate) as prev_enddate
            from t
           ) t
     ) t
group by id, grp;

Upvotes: 1

Related Questions