Given date ranges and corresponding IDs, find groups of IDs with overlapping dates

Question

I have a table with dateRanges and corresponding IDs. I want to group the IDs based on whether their start/end range overlaps with the date range for another ID. If a date range for an ID is partially or completely within that for another ID, they should belong to the same group. I want to add a column indicating this grouping, alongside the start/end date as given by the smallest and largest dates within the group.

The data:

"ID"    "start" "end"
1   2018-10-02  2019-01-15
2   2019-01-13  2019-02-01
3   2018-10-01  2018-11-01
4   2018-10-05  2018-10-06
5   2019-09-09  2019-10-08
6   2019-02-06  2019-04-07
7   2019-03-24  2019-04-17
8   2019-03-21  2019-04-14
9   2019-03-27  2019-04-16
10  2019-04-30  2019-05-08

The ideal result:

"ID"    "start" "end"   "group_ID"  "group_start"   "group_end"
1   2018-10-02  2019-01-15  1   2018-10-01  2019-02-01
2   2019-01-13  2019-02-01  1   2018-10-01  2019-02-01
3   2018-10-01  2018-11-01  1   2018-10-01  2019-02-01
4   2018-10-05  2018-10-06  1   2018-10-01  2019-02-01
5   2019-09-09  2019-10-08  2   2019-09-09  2019-10-08
6   2019-02-06  2019-04-07  3   2019-02-06  2019-05-08
7   2019-03-24  2019-04-17  3   2019-02-06  2019-05-08
8   2019-03-21  2019-04-14  3   2019-02-06  2019-05-08
9   2019-03-27  2019-04-16  3   2019-02-06  2019-05-08
10  2019-04-30  2019-05-08  3   2019-02-06  2019-05-08

What I've been thinking of that may work is creating a matrix of IDs (i.e.- rows and columns spanning from ID 1 to ID 10) and filling each cell on whether the date ranges for the given intersection of IDs overlap. Following this, binning then into groups and finding the min/max for the given group, but this seems really complicated. There must be an easier solution that does not involve looking at edges on a matrix to create clusters.

Edit- format for .csv:

ID,start,end
1,2018-10-02,2019-01-15
2,2019-01-13,2019-02-01
3,2018-10-01,2018-11-01
4,2018-10-05,2018-10-06
5,2019-09-09,2019-10-08
6,2019-02-06,2019-04-07
7,2019-03-24,2019-04-17
8,2019-03-21,2019-04-14
9,2019-03-27,2019-04-16
10,2019-04-30,2019-05-08

Given date ranges and corresponding IDs, find groups of IDs with overlapping dates

Answers (1)

Related Questions