Reputation: 789
I am trying to find where "code" from df2 appears in any of the "code1,code2,code3" columns in df1 and return the rows from df2 where that match was successful.
Here is my sample code:
df1 = pd.DataFrame(
{
'terms' : ['term1','term2'],
'code1': ['1234x', '4321y'],
'code2': ['2345x','5432y'],
'code3': ['3456x','6543y']
}
)
df1 = df1[['terms'] + df1.columns[:-1].tolist()]
df2 = pd.DataFrame(
{
'name': ['Dan','Sara'],
'rate': ['3','3.5'],
'location': ['FL','OH'],
'code': ['4444g','6543y']
})
df2 = df2[['name','rate','location','code']]
#combining code1,code2,code3 into new column df1['allcodes']
df1['allcodes'] = df1[df1.columns[1:]].apply(lambda x: ','.join(x.dropna().astype(str)),axis=1)
print(df2[df2['code'].isin(df1['allcodes'])])
The desired result would be the row from df2 where the code was found in df1:
name rate location code
1 Sara 3.5 OH 6543y
The result is:
Empty DataFrame
Columns: [name, rate, location, code]
Index: []
why is this returning an empty dataframe?
Upvotes: 3
Views: 3476
Reputation: 294288
Option 1
np.in1d
Indiscriminate check if a df2.code
for a particular row is anywhere in df1
df2[np.in1d(df2.code, df1[['code1', 'code2', 'code3']])]
name rate location code
1 Sara 3.5 OH 6543y
Speed it up
df2[np.in1d(df2.code.values, df1[['code1', 'code2', 'code3']].values)]
Option 2
Check to see if df2.loc[x, 'code']
is in df1.loc[x]
We use the pd.DataFrame.eq
method so we can pass the axis=0
parameter which checks if a series is equal to each column. Basically, axis=0
means, align the index (not columns).
df2[df1[['code1', 'code2', 'code3']].eq(df2.code, 0).any(1)]
name rate location code
1 Sara 3.5 OH 6543y
Speed it up
df2[(df1[['code1', 'code2', 'code3']].values == df2['code'].values[:, None]).any(1)]
Upvotes: 3
Reputation: 76927
Use, filter
and stack
In [4647]: df2[df2['code'].isin(df1.filter(like='code').stack())]
Out[4647]:
name rate location code
1 Sara 3.5 OH 6543y
Or, if you're use about code*
columns to be after 1:
In [4648]: df2[df2['code'].isin(df1[df1.columns[1:]].stack())]
Out[4648]:
name rate location code
1 Sara 3.5 OH 6543y
Details
In [4649]: df1.filter(like='code')
Out[4649]:
code1 code2 code3
0 1234x 2345x 3456x
1 4321y 5432y 6543y
In [4650]: df1.filter(like='code').stack()
Out[4650]:
0 code1 1234x
code2 2345x
code3 3456x
1 code1 4321y
code2 5432y
code3 6543y
dtype: object
In [4651]: df2['code'].isin(df1.filter(like='code').stack())
Out[4651]:
0 False
1 True
Name: code, dtype: bool
Upvotes: 2