Reputation: 2259
I have two dataframes df1
and df2
. df1
contains the columns subject_id
and time
and df2
contains the columns subject_id
and final_time
. What I want to do is for every subject_id
in df1
add a column with final_time
from df2
but only from the subject_ids
's contained in df1
. I have tried df1.merge(df2,how='left')
but still get all of the subject_id
's from df2
which is much longer and contains many duplicates of 'subject_id`.
Example of what I am looking for:
df1
subject_id time
0 15 12:00
1 20 12:05
2 21 12:10
3 25 12:00
df2
subject_id final_time
0 15 12:30
1 15 12:30
2 15 12:30
3 20 12:45
4 20 12:45
5 21 12:50
6 25 1:00
7 25 1:00
8 25 1:00
What I am looking for
subject_id time final_time
0 15 12:00 12:30
1 20 12:05 12:45
2 21 12:10 12:50
3 25 12:00 1:00
Upvotes: 2
Views: 21733
Reputation: 294488
Works for me. Nothing in results that aren't in df1
df1 = pd.DataFrame(dict(subject_id=[1, 2, 3], time=[9, 8, 7]))
df2 = pd.DataFrame(dict(subject_id=[2, 2, 4], final_time=[6, 5, 4]))
df1.merge(df2, 'left')
subject_id time final_time
0 1 9 NaN
1 2 8 6.0
2 2 8 5.0
3 3 7 NaN
Upvotes: 0
Reputation: 426
You should use
df1.merge(df2, on='subject_id')
The default for how
is inner, which will only match those entries that are in both columns. on
tells the merge to match only on the column you are interested in
Upvotes: 6