Reputation: 3011
I have two data frames
df1
Srlno id image
1 3 image1.jpg
2 3 image2.jpg
3 3 image2.jpg
df2
Srlno id image
1 1 image1.jpg
2 2 image2.jpg
3 3 image3.jpg
I want to match both the data frames based on the column Image and return the Id from df2 to df1 as a newcolumn. The image names in df2 are unique whereas the image names in df1 has lot of duplicates. I want to retain the duplicate image names but fill in the correct id for each image from df2.
The expected output is :
Srlno id image newids
1 3 image1.jpg 1
2 3 image2.jpg 2
3 3 image2.jpg 2
I tried with
df1['newids'] = df1['image'].map(df2.set_index('image')['id'])
This returns an error InvalidInvexError('Reindexing only valid with uniquely valued index objects') I understand the duplicates in df1 is creating this error...but don't know how to resolve.
Upvotes: 4
Views: 1405
Reputation: 75080
Another solution with dict(zip())
df1['newids']=df1.image.map(dict(zip(df2.image,df2.id)))
print(df1)
Srlno id image newids
0 1 3 image1.jpg 1
1 2 3 image2.jpg 2
2 3 3 image2.jpg 2
Upvotes: 6
Reputation: 862441
Use drop_duplicates
for get only unique image
value for map
:
#default keep first dupe
s = df2.drop_duplicates('image').set_index('image')['id']
df1['newids'] = df1['image'].map(s)
#keep last dupe
s = df2.drop_duplicates('image', keep='last').set_index('image')['id']
df1['newids'] = df1['image'].map(s)
#keep last dupe
d = dict(zip(df2['image'], df2['id']))
df1['newids'] = df1['image'].map(d)
Upvotes: 3