Sebastien Kaczor
Sebastien Kaczor

Reputation: 5

DataFrame, adding value from DF1 in specific row in DF2 if specific columns value is in DF1

I have been searching a lot on SO and in pandas's help but couldn't find what I am looking for.

I have 2 dataframes with those columns :

Index([u'id', u'date', u'heure', u'titre'], dtype='object')

Index([u'article', u'id', u'type', u'rubrique', u'source', u'rebond_global',
   u'chargement_global', u'visites_global'],
  dtype='object')

What I'd love to be able to do is to keep the second one and to add data contained in the first dataframe using the 'id' as a key.

My final DataFrame always feels like I have made an append and added the new columns.

This is, amongst others, what I have tried :

Join method :

df1.set_index('id').join(df2.set_index('id'))

Merge method :

pd.merge(df1, df2, how='outer', on='id')

In a way, what i'm trying to do is something similar to "if id from Dataframe 1 is in DataFrame 2 then create columns 'date', 'heure' and 'titre' in DataFrame 2 and fill with value from Dataframe 1"

Is there anyway to do this?

Upvotes: 0

Views: 101

Answers (2)

Plasma
Plasma

Reputation: 1961

Try this:

merged = pd.merge(left=df1[["id", "date", "heure", "titre"]], right=df2, on="id", how="inner")

Edit: Full example:

df1 = pd.DataFrame({
    "id": [1, 2, 3, 4],
    "date": [10, 20, 30, 40],
    "heure": ["h1", "h2", "h3", "h4"],
    "titre": ["t1", "t2", "t3", "t4"]
    })

df2 = pd.DataFrame({
    "id": [1, 2, 3, 5],
    "article": ["a1", "a2", "a3", "a5"]
    })

merged = pd.merge(left=df1[["id", "date", "heure", "titre"]], right=df2, on="id", how="inner")

print "DF1:\n", df1
print "DF2:\n", df2
print "Merged:\n", merged

Prints:

DF1:
   date heure  id titre
0    10    h1   1    t1
1    20    h2   2    t2
2    30    h3   3    t3
3    40    h4   4    t4
DF2:
  article  id
0      a1   1
1      a2   2
2      a3   3
3      a5   5
Merged:
   id  date heure titre article
0   1    10    h1    t1      a1
1   2    20    h2    t2      a2
2   3    30    h3    t3      a3

Upvotes: 0

Rob
Rob

Reputation: 3513

You want to use df2 as a basis, then join df1 using column 'id':

df2.join(df1.set_index('id'), 'id')

Upvotes: 1

Related Questions