GNMO11
GNMO11

Reputation: 2259

Pandas merge adding column

I have two dataframes df1 and df2. df1 contains the columns subject_id and time and df2 contains the columns subject_id and final_time. What I want to do is for every subject_id in df1 add a column with final_time from df2 but only from the subject_ids's contained in df1. I have tried df1.merge(df2,how='left') but still get all of the subject_id's from df2 which is much longer and contains many duplicates of 'subject_id`.

Example of what I am looking for:

df1


    subject_id  time  

 0          15  12:00 
 1          20  12:05 
 2          21  12:10
 3          25  12:00 

df2


    subject_id  final_time 

 0          15       12:30      
 1          15       12:30      
 2          15       12:30      
 3          20       12:45      
 4          20       12:45      
 5          21       12:50      
 6          25        1:00       
 7          25        1:00       
 8          25        1:00      

What I am looking for

    subject_id  time  final_time 

 0          15  12:00      12:30      
 1          20  12:05      12:45      
 2          21  12:10      12:50      
 3          25  12:00       1:00       

Upvotes: 2

Views: 21733

Answers (2)

piRSquared
piRSquared

Reputation: 294488

Works for me. Nothing in results that aren't in df1

df1 = pd.DataFrame(dict(subject_id=[1, 2, 3], time=[9, 8, 7]))
df2 = pd.DataFrame(dict(subject_id=[2, 2, 4], final_time=[6, 5, 4]))

df1.merge(df2, 'left')

   subject_id  time  final_time
0           1     9         NaN
1           2     8         6.0
2           2     8         5.0
3           3     7         NaN

Upvotes: 0

dugup
dugup

Reputation: 426

You should use

df1.merge(df2, on='subject_id')

The default for how is inner, which will only match those entries that are in both columns. on tells the merge to match only on the column you are interested in

Upvotes: 6

Related Questions