Convert data of overlapping time ranges to data of distinct time ranges

Question

I have data that consists of observations of the range of time a particular status applied for an individual. An individual can have multiple statuses at a given time, or have no stats at all (in which case there would not be an observation).

ID  STATUS START      END    
1   A      2013-03-07 2013-04-20
1   B      2013-04-10 2013-05-10
1   C      2013-04-16 2013-07-11
1   B      2013-07-25 2013-08-08
2   A      2013-09-10 2014-04-21
2   C      2013-12-27 2014-01-26
2   D      2014-04-28 2014-05-10
2   E      2014-05-11 2014-07-16

I would like to convert this into a dataframe consisting of distinct, non-overlapping periods and all of the statuses that applied during those periods. The result from the example would be as below.

ID  STATUS START      END    
1   A      2013-03-07 2013-04-10
1   A,B    2013-04-10 2013-04-16
1   A,B,C  2013-04-16 2013-04-20
1   B,C    2013-04-20 2013-05-10
1   C      2013-05-10 2013-07-11
1   B      2013-07-25 2013-08-08
2   A      2013-09-10 2013-12-27
2   A,C    2013-12-27 2014-01-26
2   D      2014-04-28 2014-05-10
2   E      2014-05-11 2014-07-16

My data is fairly large, and so I struggling to find an efficient way to do this. I've found semi-related problems, such as here, but they typically involve collapsing overlapping regions into a super region, rather than breaking them out into distinct subregions.

Any help or ideas would be appreciated. Thanks.

Convert data of overlapping time ranges to data of distinct time ranges

Answers (1)

Related Questions