Awans
Awans

Reputation: 231

Merge rows based on date range

I have a pandas df with hundreds of columns and thousands of rows. Here are the 3 columns that interest us:

ID startDate endDate
123 2020-01-01 2020-01-25
123 2020-01-26 2020-02-08
123 2020-02-09 2020-03-12

I want for each row with the same ID, merge the rows if the dates follow each others, and keep all other columns intact.

For our example, the output would be a single row because the dates follow:

ID startDate endDate
123 2020-01-01 2020-03-12

Do you have an idea on how to do it with pandas?

Upvotes: 3

Views: 450

Answers (2)

jezrael
jezrael

Reputation: 862431

If datetimes are not sorted or not sure use min and max for aggregation:

df.groupby('ID', as_index=False).agg({'startDate': 'min', 'endDate': 'max'})

If there is a lot another columns and need aggregate only 2 columns:

df['startDate'] = df.groupby('ID')['startDate'].transform('min')
df['endDate'] = df.groupby('ID')['endDate'].transform('max')

df = df.drop_duplicates('ID')

Upvotes: 4

U13-Forward
U13-Forward

Reputation: 71560

Try groupby with agg and first with last:

>>> df.groupby('ID', as_index=False).agg({'startDate': 'first', 'endDate': 'last'})
    ID   startDate     endDate
0  123  2020-01-01  2020-03-12
>>> 

Upvotes: 3

Related Questions