Reputation: 713
How could I join the rows based on the resume column?
I mean, go joining the rows until in resume column there is a 1.
For joined rows I want to use an aggregate function for each column, something like that:
{ 'tunein': 'first', 'tuneout': 'last', 'watching(minute)': 'sum' }
Example: from this dataframe
account_id asset_id tunein tuneout resume watching(minute)
1 61745 2967 12:42:00 13:01:00 0 19.0
2 61745 2967 15:48:00 15:51:00 1 3.0
3 61745 2967 15:52:00 15:56:00 1 4.0
4 61745 2967 15:57:00 16:23:00 0 26.0
5 61745 2967 21:06:00 21:10:00 0 4.0
6 61745 2967 22:17:00 22:37:00 1 20.0
7 61745 2967 23:55:00 23:58:00 1 3.0
get this:
account_id asset_id tunein tuneout resume watching(minute)
1 61745 2967 12:42:00 15:56:00 0 26.0
2 61745 2967 15:57:00 16:23:00 0 26.0
3 61745 2967 21:06:00 23:58:00 0 27.0
Upvotes: 0
Views: 39
Reputation: 35626
Let's create groups with cumsum
based on where there are 0
in the resume
column:
df = (
df.groupby(df['resume'].eq(0).cumsum())
.agg({'account_id': 'first', 'asset_id': 'first',
'tunein': 'first', 'tuneout': 'last', 'resume': 'first',
'watching(minute)': 'sum'})
.rename_axis(None)
)
The dictionary can be programmatically created as well:
d = {c: 'first' for c in df.columns} # default 'first'
d['tuneout'] = 'last' # set specific columns
d['watching(minute)'] = 'sum'
df = df.groupby(df['resume'].eq(0).cumsum()).agg(d).rename_axis(None)
Both options produce df
:
account_id asset_id tunein tuneout resume watching(minute)
1 61745 2967 12:42:00 15:56:00 0 26.0
2 61745 2967 15:57:00 16:23:00 0 26.0
3 61745 2967 21:06:00 23:58:00 0 27.0
How groups are created:
df['resume'].eq(0).cumsum()
0 1
1 1
2 1
3 2
4 3
5 3
6 3
Name: resume, dtype: int32
Upvotes: 1