Mornor
Mornor

Reputation: 3803

Issue when changing row value based of columns in a Pandas DataFrame

I have a Pandas DataFrame which look like this:

status    nb_success     id 
    ok             0      A
    ok             0      A 
    ok             0      A
    nok            0      B

I would like to count the number of ok status per id by incrementing the nb_success each time the status is ok. So, the result I would like to get is the following dataframe:

status    nb_success     id 
    ok             0      A
    ok             1      A 
    ok             2      A
    nok            0      B

To me, the way to go would be to use df.apply, which I have done like:

def update_count(nb_success, status):
        if status is 'ok': 
            return nb_success + 1

dataframe['nb_success'] = dataframe.apply(lambda row: update_count(row['nb_success'], row['status']), axis=1)

But when I try to print all the row where nb_success != 0, I got an empty Dataframe. print(dataframe.loc[dataframe['nb_success'] != 0])

What do I do wrong here ?

Upvotes: 1

Views: 49

Answers (2)

jezrael
jezrael

Reputation: 863166

Better is use loc with cumsum of boolean mask - True values are converted to 1, get Series 1,2,3..., so need subtract by sub:

mask = df['status'] == 'ok'
df.loc[mask, 'nb_success'] = mask.cumsum() - 1
print (df)
  status  nb_success id
0     ok           0  A
1     ok           1  A
2     ok           2  A
3    nok           0  B

Or use mask:

mask = df['status'] == 'ok'
df['nb_success']= df['nb_success'].mask(mask, mask.cumsum() - 1)
print (df)
  status  nb_success id
0     ok           0  A
1     ok           1  A
2     ok           2  A
3    nok           0  B

Solutions with cumcount - a bit overcomplicated:

I think you need mask:

df ['nb_success']= df['nb_success'].mask(df['status'] == 'ok',
                                         df.groupby('status')['nb_success'].cumcount())
print (df)
  status  nb_success id
0     ok           0  A
1     ok           1  A
2     ok           2  A
3    nok           0  B

Similar solution with loc:

mask = df['status'] == 'ok'
df.loc[mask, 'nb_success']= df[mask].groupby('status')['nb_success'].cumcount()
print (df)
  status  nb_success id
0     ok           0  A
1     ok           1  A
2     ok           2  A
3    nok           0  B

Upvotes: 3

sunt
sunt

Reputation: 519

success_count = 0
def update_count(nb_success, status):
        global success_count
        if status is 'ok': 
            success_count += 1
        return success_count 

dataframe['nb_success'] = dataframe.apply(lambda row: update_count(row['nb_success'], row['status']), axis=1)

test this

Upvotes: 0

Related Questions