Alexander
Alexander

Reputation: 4635

groupby operations with conditionals in pandas dataframe

I want to perform a groupby operation in pandas. For example, I want to group patient column and if the treatment column == X transfer correponding doctor value to the new column called nurse .

For example: df

import pandas as pd
import numpy as np

df = pd.DataFrame({'patient': ['a','a','a','b','b','b'],
   ....:           'treatment': ['X','Y','Y','X','Z','Z'],
                   'doctor': ['1','2','2','2','3','3']})

  patient treatment doctor
0       a         X      1
1       a         Y      2
2       a         Y      2
3       b         X      2
4       b         Z      3
5       b         Z      3

I tried

df=df.assign(nurse=np.where(df.['treatment'].str.contains('X'),df.groupby('patient')['doctor'], np.nan))

but getting error

SyntaxError: invalid syntax

the expected output

    patient treatment doctor  nurse
0       a         X      1      1
1       a         Y      2      1
2       a         Y      2      1
3       b         X      2      2
4       b         Z      3      2
5       b         Z      3      2

How can I achieve this output ?

thx

Upvotes: 0

Views: 65

Answers (1)

ansev
ansev

Reputation: 30920

Use DataFrame.apply + Series.where. Then stuffed with ffill:

df['nurse']=df.groupby('patient',sort=False).apply(lambda x: x['doctor'].where(x['treatment'].eq('X')).ffill()).reset_index(drop=True)
print(df)

     patient treatment doctor nurse
0       a         X      1     1
1       a         Y      2     1
2       a         Y      2     1
3       b         X      2     2
4       b         Z      3     2
5       b         Z      3     2

Upvotes: 3

Related Questions