drchops
drchops

Reputation: 77

How to subtract values in a column using groupby

I have the following dataframe:

ID  Days TreatmentGiven TreatmentNumber
--- ---- -------------- ---------------
1    0      False             NaN
1    30     False             NaN
1    40     True               1
1    56     False             NaN 
2    0      False             NaN
2    14     True               1
2    28     True               2 

I'd like to create a new column with a new baseline for Days based on when the first treatment was given (TreatmentNumber==1), grouped by ID so that the result is the following:

ID  Days TreatmentGiven TreatmentNumber New_Baseline
--- ---- -------------- --------------- ------------
1    0      False             NaN          -40
1    30     False             NaN          -10
1    40     True               1            0
1    56     False             NaN           16
2    0      False             NaN          -14
2    14     True               1            0
2    28     True               2            14

What is the best way to do this?

Thank you.

Upvotes: 3

Views: 143

Answers (3)

jezrael
jezrael

Reputation: 862551

Idea is filter rows with 1 in TreatmentNumber, then convert to Series for Series.map by ID used for subtract by Days column with Series.sub:

s = df[df['TreatmentNumber'].eq(1)].set_index('ID')['Days']
#Series created by first True rows by TreatmentGiven per groups
#s = df[df['TreatmentGiven']].drop_duplicates('ID').set_index('ID')['Days']
df['New_Baseline'] = df['Days'].sub(df['ID'].map(s))
print (df)
   ID  Days  TreatmentGiven  TreatmentNumber  New_Baseline
0   1     0           False              NaN           -40
1   1    30           False              NaN           -10
2   1    40            True              1.0             0
3   1    56           False              NaN            16
4   2     0           False              NaN           -14
5   2    14            True              1.0             0
6   2    28            True              2.0            14

Detail:

print (s)
ID
1    40
2    14
Name: Days, dtype: int64

print (df['ID'].map(s))
0    40
1    40
2    40
3    40
4    14
5    14
6    14
Name: ID, dtype: int64

Upvotes: 2

anky
anky

Reputation: 75080

Here is one approach with series.where + groupby+transform:

s = df['Days'].where(df['TreatmentGiven']).groupby(df['ID']).transform('first')
df['New_Baseline'] = df['Days'].sub(s)

Output

   ID  Days  TreatmentGiven  TreatmentNumber  New_Baseline
0   1     0           False              NaN         -40.0
1   1    30           False              NaN         -10.0
2   1    40            True              1.0           0.0
3   1    56           False              NaN          16.0
4   2     0           False              NaN         -14.0
5   2    14            True              1.0           0.0
6   2    28            True              2.0          14.0

Upvotes: 2

Celius Stingher
Celius Stingher

Reputation: 18367

This is anoher approach:

aux = df[df['TreatmentGiven']==True].groupby('ID')['Days'].first().reset_index()

df = df.merge(aux,how='left',on='ID').rename(columns={'Days_x':'Days','Days_y':'New_baseline'})
df['New_baseline'] = df['Days'] - df['New_baseline']

Output:

     ID Days    TreatmentGiven  TreatMentNumber New_baseline
 0    1    0             False              NaN          -40
 1    1   30             False              NaN          -10
 2    1   40              True              1.0            0
 3    1   56             False              NaN           16
 4    2    0             False              NaN          -14
 5    2   14              True              1.0            0
 6    2   28              True              2.0           14

Upvotes: 1

Related Questions