Reputation: 13662
Consider the dataframe
df = pd.DataFrame(
[
['A', 1],
['A', 1],
['B', 1],
['B', 0],
['A', 0],
['A', 1],
['B', 1]
], columns = ['key', 'cond'])
I want to find a cumulative (running) count (starting at 1) for each key
, where we only increment if the previous value in the group had cond == 1
. When appended to the above dataframe this would give
df_result = pd.DataFrame(
[
['A', 1, 1],
['A', 1, 2],
['B', 1, 1],
['B', 0, 2],
['A', 0, 3],
['A', 1, 3],
['B', 1, 2]
], columns = ['key', 'cond'])
Note that essentially the cond
values of the last rows in each key
group have no effect.
Just doing a simple group
and cumcount
df.groupby('key').cumcount()
of course doesn't account for the cond
value of the previous element. How can I take this into account?
EDIT
As some of the solutions below don't work on some edge cases, I will give a more comprehensive dataframe for testing.
df = pd.DataFrame(
[
['A', 0],
['A', 1],
['A', 1],
['B', 1],
['B', 0],
['A', 0],
['A', 1],
['B', 1],
['B', 0]
], columns = ['key', 'cond'])
which when appending the true result should give
df_result = pd.DataFrame(
[
['A', 0, 1],
['A', 1, 1],
['A', 1, 2],
['B', 1, 1],
['B', 0, 2],
['A', 0, 3],
['A', 1, 3],
['B', 1, 2],
['B', 0, 3]
], columns = ['key', 'cond'])
Upvotes: 5
Views: 11160
Reputation: 402323
Use groupby
with combination of shift
and cumsum
.
df['new'] = df.groupby('key').cond.apply(
lambda x: x.shift().fillna(1).cumsum()
).astype(int)
df
key cond new
0 A 1 1
1 A 1 2
2 B 1 1
3 B 0 2
4 A 0 3
5 A 1 3
6 B 1 2
Upvotes: 8
Reputation: 862511
Use groupby
with custom lambda function with shift
for previous values, replace first NaN
by back filling if possible first value is 0
per key and last cumsum
with convert to int
:
df['new'] = df.groupby('key')['cond'].apply(lambda x: x.shift().bfill().cumsum()).astype(int)
print (df)
key cond new
0 A 1 1
1 A 1 2
2 B 1 1
3 B 0 2
4 A 0 3
5 A 1 3
6 B 1 2
Upvotes: 2
Reputation: 198314
df['cnt'] = df[df["cond"]==1].groupby('key').cumcount()+1
df['cnt'] = df.groupby('key')['cnt'].fillna(method='bfill')
df
# => key cond cnt
# 0 A 1 1.0
# 1 A 1 2.0
# 2 B 1 1.0
# 3 B 0 2.0
# 4 A 0 3.0
# 5 A 1 3.0
# 6 B 1 2.0
Upvotes: 4