Reputation: 4635
I want to perform a groupby operation in pandas. For example, I want to group patient
column and if the treatment
column == X
transfer correponding doctor
value to the new column called nurse
.
For example: df
import pandas as pd
import numpy as np
df = pd.DataFrame({'patient': ['a','a','a','b','b','b'],
....: 'treatment': ['X','Y','Y','X','Z','Z'],
'doctor': ['1','2','2','2','3','3']})
patient treatment doctor
0 a X 1
1 a Y 2
2 a Y 2
3 b X 2
4 b Z 3
5 b Z 3
I tried
df=df.assign(nurse=np.where(df.['treatment'].str.contains('X'),df.groupby('patient')['doctor'], np.nan))
but getting error
SyntaxError: invalid syntax
the expected output
patient treatment doctor nurse
0 a X 1 1
1 a Y 2 1
2 a Y 2 1
3 b X 2 2
4 b Z 3 2
5 b Z 3 2
How can I achieve this output ?
thx
Upvotes: 0
Views: 65
Reputation: 30920
Use DataFrame.apply + Series.where. Then stuffed with ffill:
df['nurse']=df.groupby('patient',sort=False).apply(lambda x: x['doctor'].where(x['treatment'].eq('X')).ffill()).reset_index(drop=True)
print(df)
patient treatment doctor nurse
0 a X 1 1
1 a Y 2 1
2 a Y 2 1
3 b X 2 2
4 b Z 3 2
5 b Z 3 2
Upvotes: 3