Reputation: 3803
I have a Pandas DataFrame which look like this:
status nb_success id
ok 0 A
ok 0 A
ok 0 A
nok 0 B
I would like to count the number of ok
status per id
by incrementing the nb_success
each time the status
is ok
.
So, the result I would like to get is the following dataframe:
status nb_success id
ok 0 A
ok 1 A
ok 2 A
nok 0 B
To me, the way to go would be to use df.apply, which I have done like:
def update_count(nb_success, status):
if status is 'ok':
return nb_success + 1
dataframe['nb_success'] = dataframe.apply(lambda row: update_count(row['nb_success'], row['status']), axis=1)
But when I try to print
all the row where nb_success != 0
, I got an empty Dataframe.
print(dataframe.loc[dataframe['nb_success'] != 0])
What do I do wrong here ?
Upvotes: 1
Views: 49
Reputation: 863166
Better is use loc
with cumsum
of boolean mask - True
values are converted to 1
, get Series 1,2,3...
, so need subtract by sub
:
mask = df['status'] == 'ok'
df.loc[mask, 'nb_success'] = mask.cumsum() - 1
print (df)
status nb_success id
0 ok 0 A
1 ok 1 A
2 ok 2 A
3 nok 0 B
Or use mask
:
mask = df['status'] == 'ok'
df['nb_success']= df['nb_success'].mask(mask, mask.cumsum() - 1)
print (df)
status nb_success id
0 ok 0 A
1 ok 1 A
2 ok 2 A
3 nok 0 B
Solutions with cumcount
- a bit overcomplicated:
I think you need mask
:
df ['nb_success']= df['nb_success'].mask(df['status'] == 'ok',
df.groupby('status')['nb_success'].cumcount())
print (df)
status nb_success id
0 ok 0 A
1 ok 1 A
2 ok 2 A
3 nok 0 B
Similar solution with loc
:
mask = df['status'] == 'ok'
df.loc[mask, 'nb_success']= df[mask].groupby('status')['nb_success'].cumcount()
print (df)
status nb_success id
0 ok 0 A
1 ok 1 A
2 ok 2 A
3 nok 0 B
Upvotes: 3
Reputation: 519
success_count = 0
def update_count(nb_success, status):
global success_count
if status is 'ok':
success_count += 1
return success_count
dataframe['nb_success'] = dataframe.apply(lambda row: update_count(row['nb_success'], row['status']), axis=1)
test this
Upvotes: 0