Reputation: 5
I have been searching a lot on SO and in pandas's help but couldn't find what I am looking for.
I have 2 dataframes with those columns :
Index([u'id', u'date', u'heure', u'titre'], dtype='object')
Index([u'article', u'id', u'type', u'rubrique', u'source', u'rebond_global',
u'chargement_global', u'visites_global'],
dtype='object')
What I'd love to be able to do is to keep the second one and to add data contained in the first dataframe using the 'id' as a key.
My final DataFrame always feels like I have made an append and added the new columns.
This is, amongst others, what I have tried :
Join method :
df1.set_index('id').join(df2.set_index('id'))
Merge method :
pd.merge(df1, df2, how='outer', on='id')
In a way, what i'm trying to do is something similar to "if id from Dataframe 1 is in DataFrame 2 then create columns 'date', 'heure' and 'titre' in DataFrame 2 and fill with value from Dataframe 1"
Is there anyway to do this?
Upvotes: 0
Views: 101
Reputation: 1961
Try this:
merged = pd.merge(left=df1[["id", "date", "heure", "titre"]], right=df2, on="id", how="inner")
Edit: Full example:
df1 = pd.DataFrame({
"id": [1, 2, 3, 4],
"date": [10, 20, 30, 40],
"heure": ["h1", "h2", "h3", "h4"],
"titre": ["t1", "t2", "t3", "t4"]
})
df2 = pd.DataFrame({
"id": [1, 2, 3, 5],
"article": ["a1", "a2", "a3", "a5"]
})
merged = pd.merge(left=df1[["id", "date", "heure", "titre"]], right=df2, on="id", how="inner")
print "DF1:\n", df1
print "DF2:\n", df2
print "Merged:\n", merged
Prints:
DF1:
date heure id titre
0 10 h1 1 t1
1 20 h2 2 t2
2 30 h3 3 t3
3 40 h4 4 t4
DF2:
article id
0 a1 1
1 a2 2
2 a3 3
3 a5 5
Merged:
id date heure titre article
0 1 10 h1 t1 a1
1 2 20 h2 t2 a2
2 3 30 h3 t3 a3
Upvotes: 0
Reputation: 3513
You want to use df2 as a basis, then join df1 using column 'id':
df2.join(df1.set_index('id'), 'id')
Upvotes: 1