Merge rows based on date range

Question

I have a pandas df with hundreds of columns and thousands of rows. Here are the 3 columns that interest us:

ID	startDate	endDate
123	2020-01-01	2020-01-25
123	2020-01-26	2020-02-08
123	2020-02-09	2020-03-12

I want for each row with the same ID, merge the rows if the dates follow each others, and keep all other columns intact.

For our example, the output would be a single row because the dates follow:

ID	startDate	endDate
123	2020-01-01	2020-03-12

Do you have an idea on how to do it with pandas?

jezrael · Accepted Answer

If datetimes are not sorted or not sure use min and max for aggregation:

df.groupby('ID', as_index=False).agg({'startDate': 'min', 'endDate': 'max'})

If there is a lot another columns and need aggregate only 2 columns:

df['startDate'] = df.groupby('ID')['startDate'].transform('min')
df['endDate'] = df.groupby('ID')['endDate'].transform('max')

df = df.drop_duplicates('ID')

Merge rows based on date range

Answers (2)

Related Questions