Reputation: 77
I have the following dataframe:
ID Days TreatmentGiven TreatmentNumber
--- ---- -------------- ---------------
1 0 False NaN
1 30 False NaN
1 40 True 1
1 56 False NaN
2 0 False NaN
2 14 True 1
2 28 True 2
I'd like to create a new column with a new baseline for Days based on when the first treatment was given (TreatmentNumber==1), grouped by ID so that the result is the following:
ID Days TreatmentGiven TreatmentNumber New_Baseline
--- ---- -------------- --------------- ------------
1 0 False NaN -40
1 30 False NaN -10
1 40 True 1 0
1 56 False NaN 16
2 0 False NaN -14
2 14 True 1 0
2 28 True 2 14
What is the best way to do this?
Thank you.
Upvotes: 3
Views: 143
Reputation: 862551
Idea is filter rows with 1
in TreatmentNumber
, then convert to Series
for Series.map
by ID
used for subtract by Days
column with Series.sub
:
s = df[df['TreatmentNumber'].eq(1)].set_index('ID')['Days']
#Series created by first True rows by TreatmentGiven per groups
#s = df[df['TreatmentGiven']].drop_duplicates('ID').set_index('ID')['Days']
df['New_Baseline'] = df['Days'].sub(df['ID'].map(s))
print (df)
ID Days TreatmentGiven TreatmentNumber New_Baseline
0 1 0 False NaN -40
1 1 30 False NaN -10
2 1 40 True 1.0 0
3 1 56 False NaN 16
4 2 0 False NaN -14
5 2 14 True 1.0 0
6 2 28 True 2.0 14
Detail:
print (s)
ID
1 40
2 14
Name: Days, dtype: int64
print (df['ID'].map(s))
0 40
1 40
2 40
3 40
4 14
5 14
6 14
Name: ID, dtype: int64
Upvotes: 2
Reputation: 75080
Here is one approach with series.where
+ groupby+transform
:
s = df['Days'].where(df['TreatmentGiven']).groupby(df['ID']).transform('first')
df['New_Baseline'] = df['Days'].sub(s)
Output
ID Days TreatmentGiven TreatmentNumber New_Baseline
0 1 0 False NaN -40.0
1 1 30 False NaN -10.0
2 1 40 True 1.0 0.0
3 1 56 False NaN 16.0
4 2 0 False NaN -14.0
5 2 14 True 1.0 0.0
6 2 28 True 2.0 14.0
Upvotes: 2
Reputation: 18367
This is anoher approach:
aux = df[df['TreatmentGiven']==True].groupby('ID')['Days'].first().reset_index()
df = df.merge(aux,how='left',on='ID').rename(columns={'Days_x':'Days','Days_y':'New_baseline'})
df['New_baseline'] = df['Days'] - df['New_baseline']
Output:
ID Days TreatmentGiven TreatMentNumber New_baseline
0 1 0 False NaN -40
1 1 30 False NaN -10
2 1 40 True 1.0 0
3 1 56 False NaN 16
4 2 0 False NaN -14
5 2 14 True 1.0 0
6 2 28 True 2.0 14
Upvotes: 1