Reputation: 1679
I have a dataframe that looks like this:
data metadata
A 0
A 1
A 2
A 3
A 4
B 0
B 1
B 2
A 0
A 1
B 0
A 0
A 1
B 0
df.data
contains two different categories, A
and B
. df.metadata
stores a running count the number of times a category appears consecutively before the category changes. I want to create a column consecutive_count
that assigns the max value of metadata per consecutive group to every row in that group. It should look like this:
data metadata consecutive_count
A 0 4
A 1 4
A 2 4
A 3 4
A 4 4
B 0 2
B 1 2
B 2 2
A 0 1
A 1 1
B 0 0
A 0 1
A 1 1
B 0 0
Please advise. Thank you.
Upvotes: 1
Views: 37
Reputation: 25269
Method 1:
You may try transform max
on groupby of each group of data
s = df.data.ne(df.data.shift()).cumsum()
df['consecutive_count'] = df.groupby(s).metadata.transform('max')
Out[96]:
data metadata consecutive_count
0 A 0 4
1 A 1 4
2 A 2 4
3 A 3 4
4 A 4 4
5 B 0 2
6 B 1 2
7 B 2 2
8 A 0 1
9 A 1 1
10 B 0 0
11 A 0 1
12 A 1 1
13 B 0 0
Method 2:
Since metadata
is sorted per group, you may reverse dataframe and do groupby cummax
s = df.data.ne(df.data.shift()).cumsum()
df['consecutive_count'] = df[::-1].groupby(s).metadata.cummax()
Out[101]:
data metadata consecutive_count
0 A 0 4
1 A 1 4
2 A 2 4
3 A 3 4
4 A 4 4
5 B 0 2
6 B 1 2
7 B 2 2
8 A 0 1
9 A 1 1
10 B 0 0
11 A 0 1
12 A 1 1
13 B 0 0
Upvotes: 2