Reputation: 51
So I have no idea how to this, and looking around for half a day I have not found my solution.
My data looks something like this
import pandas as pd
df1 = pd.DataFrame(
[['132','233','472','098'], ['482','214','980',''], ['107','','',''],
['571','498','',''],], columns=["p1", "p2", "p3", "p4"])
df2 = pd.DataFrame(['532','233','980','132', '298'], columns=["p"])
df1
p1 p2 p3
0 132 233 472
1 482 214 980
2 107
3 571 498
df2
p
0 532
1 233
2 980
3 132
4 298
I wish to match the values in the p column with any one of the values in the p{1-3} columns, and create a new column which contains the matched string.
So in this instance my desired output is
df_output
p1 p2 p3 matched_p
0 132 233 472 233
1 482 214 980 980
2 107
3 571 498
I tried the following
filter1 = df1['p1'].isin(df2['p'])
filter2 = df1['p2'].isin(df2['p'])
filter3 = df1['p3'].isin(df2['p'])
df1['matched_p'] = df2['p'][filter1 | filter2 | filter3]
however, this gave me non-sensical results.
Any ideas on how to approach this problem?
Upvotes: 2
Views: 83
Reputation: 20669
You can try this. Using df.isin
and df.where
with df.max
over axis 1.
df1 = df1.replace('',np.nan).astype(float) # to convert everything to float.
df2 = df2.astype(float) #to convert everything to float.
m = df1.isin(df2['p'].to_numpy())
df1['matched_values'] = df1.where(m,0).max(1)
df1
p1 p2 p3 p4 matched_values
0 132.0 233.0 472.0 98.0 233.0
1 482.0 214.0 980.0 NaN 980.0
2 107.0 NaN NaN NaN NaN
3 571.0 498.0 NaN NaN NaN
If you don't want to convert your dtypes to float
.
Inspired from @Erfan's solution. I combined our approaches.
df1['matched'] = (df1.where(
df1.isin(df2['p'].to_numpy()),'').
add(',').sum(1).str.strip(','))
Upvotes: 2
Reputation: 42886
We can use stack
and unstack
here with isin
and some string manipulation. This will also account for multiple matches:
d1 = df1.stack()
d1 = d1.where(d1.isin(df2['p'])).unstack().fillna('')
d1 = d1.add(',').sum(axis=1).str.strip(',')
df1['matched_p'] = d1
p1 p2 p3 p4 matched_p
0 132 233 472 098 132,233
1 482 214 980 980
2 107
3 571 498
Upvotes: 2
Reputation: 260
set1 = set(df2['p'])
df1['p'] = df1.apply(lambda x: {x['p1'], x['p2'], x['p3'], x['p4']}.intersection(set1), axis=1)
df1['p'] = df1['p'].map(lambda x: x.pop() if x else '')
Upvotes: 0