Creating a column that assigns max value of set of rows by condition to all rows in that group

Question

I have a dataframe that looks like this:

data    metadata
A         0
A         1
A         2
A         3
A         4
B         0
B         1
B         2
A         0
A         1
B         0
A         0
A         1
B         0

df.data contains two different categories, A and B. df.metadata stores a running count the number of times a category appears consecutively before the category changes. I want to create a column consecutive_count that assigns the max value of metadata per consecutive group to every row in that group. It should look like this:

data    metadata    consecutive_count
A         0              4
A         1              4
A         2              4
A         3              4
A         4              4
B         0              2
B         1              2
B         2              2
A         0              1
A         1              1
B         0              0
A         0              1
A         1              1
B         0              0

Please advise. Thank you.

Andy L. · Accepted Answer

Method 1:
You may try transform max on groupby of each group of data

s = df.data.ne(df.data.shift()).cumsum()
df['consecutive_count'] = df.groupby(s).metadata.transform('max')

Out[96]:
   data  metadata  consecutive_count
0     A         0                  4
1     A         1                  4
2     A         2                  4
3     A         3                  4
4     A         4                  4
5     B         0                  2
6     B         1                  2
7     B         2                  2
8     A         0                  1
9     A         1                  1
10    B         0                  0
11    A         0                  1
12    A         1                  1
13    B         0                  0

Method 2:
Since metadata is sorted per group, you may reverse dataframe and do groupby cummax

s = df.data.ne(df.data.shift()).cumsum()
df['consecutive_count'] = df[::-1].groupby(s).metadata.cummax()

Out[101]:
   data  metadata  consecutive_count
0     A         0                  4
1     A         1                  4
2     A         2                  4
3     A         3                  4
4     A         4                  4
5     B         0                  2
6     B         1                  2
7     B         2                  2
8     A         0                  1
9     A         1                  1
10    B         0                  0
11    A         0                  1
12    A         1                  1
13    B         0                  0

Creating a column that assigns max value of set of rows by condition to all rows in that group

Answers (1)

Related Questions