Reputation: 4229
I have a dataframe like this:
Customer Id Start Date End Date Count
1403120020 2014-03-13 2014-03-17 38.0
1403120020 2014-03-18 2014-04-16 283.0
1403120020 2014-04-17 2014-04-25 100.0
1403120020 2014-04-26 2014-05-15 50.0
1812040169 2018-12-07 2018-12-19 122.0
1812040169 2018-12-19 2018-12-20 10.0
1812040169 2018-12-21 2019-01-18 365.0
Here for a single customer I have multiple start dates within a particular month and one of the end date for that month lies in next month. I want to have one start and one end date for a customer in the following fashion with the count being summed up:
Customer Id Start Date End Date Count
1403120020 2014-03-13 2014-04-16 321
1403120020 2014-04-17 2014-05-15 150.0
1812040169 2018-12-07 2019-1-18 497
Upvotes: 0
Views: 30
Reputation: 13255
Use groupby.agg
:
df = (df.groupby('Customer_Id').agg({'Start_Date':'first', 'End_Date':'last', 'Count':'sum'})
.reset_index())
print(df)
Customer_Id Start_Date End_Date Count
0 1403120020 2014-03-13 2014-04-16 321.0
1 1812040169 2018-12-07 2019-01-18 497.0
EDIT :
df['grp'] = df['Start_Date'].dt.month
df = (df.groupby(['Customer_Id','grp'])
.agg({'Start_Date':'first', 'End_Date':'last', 'Count':'sum'})
.reset_index().drop('grp', axis=1))
print(df)
Customer_Id Start_Date End_Date Count
0 1403120020 2014-03-13 2014-04-16 321.0
1 1403120020 2014-04-17 2014-05-15 150.0
2 1812040169 2018-12-07 2019-01-18 497.0
Upvotes: 3