Reputation: 2904
I'm having two different dataframes : df1 and df2
df1 :
Id lkey
0 foo foo
1 bar bar
2 baz baz
3 foo foo
4 bar bar
5 foo foo
6 bar bar
7 bar bar
8 bar bar
df2 :
e rkey value y
0 aaa foo aaa foo
1 NaN bar bbb bar
2 ccc baz ccc baz
3 NaN mac ddd fff
4 NaN xyz eee mmm
5 NaN mnb fff NaN
6 NaN foo aaa NaN
Edit1 : added 6th row as a duplicate.
I want perform one task on this dataframes. I want to compare lkey and rkey columns.
Edit2 :
Note : lkey column contains all duplicate values and rkey column contains some duplicate values.
Pick up first value of lkey column i.e. foo compare this value with values of rkey column of dataframe. If match is find I want to know this row's value of value column in the df1 dataframe column name as match. (In every case match will get for lkey and rkey i.e. whatever lkey values present in df1 available in the rkey column of df2.)
I'm already tried with merge.
result = df1.merge(df2, left_on='lkey', right_on='rkey', how='outer')
output :
Id lkey e rkey value y
0 foo foo aaa foo aaa foo
1 foo foo aaa foo aaa foo
2 foo foo aaa foo aaa foo
3 bar bar NaN bar bbb bar
4 bar bar NaN bar bbb bar
5 bar bar NaN bar bbb bar
6 bar bar NaN bar bbb bar
7 bar bar NaN bar bbb bar
8 baz baz ccc baz ccc baz
9 NaN NaN NaN mac ddd fff
10 NaN NaN NaN xyz eee mmm
11 NaN NaN NaN mnb fff NaN
I don't want 11 rows. In my df1 only 9 rows are available with column Id and lkey. I just want to add match column with specific mapping.
Expected Output :
Id lkey match
0 foo foo aaa
1 bar bar bbb
2 baz baz ccc
3 foo foo aaa
4 bar bar bbb
5 foo foo aaa
6 bar bar bbb
7 bar bar bbb
8 bar bar bbb
How I can achieve what I want to do?
Edit : previously I was saying rkey column contains unique values but I'm facing issue because of that only, rkey column contains duplicate values.
Upvotes: 0
Views: 2210
Reputation: 109576
>>> (df1
.merge(df2[['rkey', 'value']].drop_duplicates(), left_on='lkey', right_on='rkey', how='left')
.drop('rkey', axis='columns')
.rename(columns={'value': 'match'})
)
Id lkey match
0 foo foo aaa
1 bar bar bbb
2 baz baz ccc
3 foo foo aaa
4 bar bar bbb
5 foo foo aaa
6 bar bar bbb
7 bar bar bbb
8 bar bar bbb
If the key column in both dataframes had the same name, you can just use on='key'
and wouldn't need to drop the right key.
Upvotes: 2