Cristian Contrera
Cristian Contrera

Reputation: 713

Rolling windows with column based condition?

How could I join the rows based on the resume column?
I mean, go joining the rows until in resume column there is a 1.
For joined rows I want to use an aggregate function for each column, something like that:

{ 'tunein': 'first', 'tuneout': 'last', 'watching(minute)': 'sum' }

Example: from this dataframe

  account_id    asset_id     tunein      tuneout resume watching(minute)
1   61745         2967      12:42:00    13:01:00    0   19.0
2   61745         2967      15:48:00    15:51:00    1   3.0
3   61745         2967      15:52:00    15:56:00    1   4.0
4   61745         2967      15:57:00    16:23:00    0   26.0
5   61745         2967      21:06:00    21:10:00    0   4.0
6   61745         2967      22:17:00    22:37:00    1   20.0
7   61745         2967      23:55:00    23:58:00    1   3.0

get this:

  account_id    asset_id     tunein      tuneout resume watching(minute)
1   61745         2967      12:42:00    15:56:00    0   26.0
2   61745         2967      15:57:00    16:23:00    0   26.0
3   61745         2967      21:06:00    23:58:00    0   27.0

Upvotes: 0

Views: 39

Answers (1)

Henry Ecker
Henry Ecker

Reputation: 35626

Let's create groups with cumsum based on where there are 0 in the resume column:

df = (
    df.groupby(df['resume'].eq(0).cumsum())
        .agg({'account_id': 'first', 'asset_id': 'first',
              'tunein': 'first', 'tuneout': 'last', 'resume': 'first',
              'watching(minute)': 'sum'})
        .rename_axis(None)
)

The dictionary can be programmatically created as well:

d = {c: 'first' for c in df.columns}  # default 'first'
d['tuneout'] = 'last'  # set specific columns
d['watching(minute)'] = 'sum'
df = df.groupby(df['resume'].eq(0).cumsum()).agg(d).rename_axis(None)

Both options produce df:

   account_id  asset_id    tunein   tuneout  resume  watching(minute)
1       61745      2967  12:42:00  15:56:00       0              26.0
2       61745      2967  15:57:00  16:23:00       0              26.0
3       61745      2967  21:06:00  23:58:00       0              27.0

How groups are created:

df['resume'].eq(0).cumsum()
0    1
1    1
2    1
3    2
4    3
5    3
6    3
Name: resume, dtype: int32

Upvotes: 1

Related Questions