mattrweaver
mattrweaver

Reputation: 789

python/pandas: why the empty dataframe when using isin?

I am trying to find where "code" from df2 appears in any of the "code1,code2,code3" columns in df1 and return the rows from df2 where that match was successful.

Here is my sample code:

df1 = pd.DataFrame(
        {
        'terms' : ['term1','term2'],
        'code1': ['1234x', '4321y'],
        'code2': ['2345x','5432y'],
        'code3': ['3456x','6543y']
        }
        )
df1 = df1[['terms'] + df1.columns[:-1].tolist()]

df2 = pd.DataFrame(
        {
        'name': ['Dan','Sara'],
        'rate': ['3','3.5'],
        'location': ['FL','OH'],
        'code': ['4444g','6543y']                           
         })
df2 = df2[['name','rate','location','code']]

#combining code1,code2,code3 into new column df1['allcodes']
df1['allcodes'] = df1[df1.columns[1:]].apply(lambda x: ','.join(x.dropna().astype(str)),axis=1)

print(df2[df2['code'].isin(df1['allcodes'])])

The desired result would be the row from df2 where the code was found in df1:

   name rate location   code
1  Sara  3.5       OH  6543y

The result is:

Empty DataFrame
Columns: [name, rate, location, code]
Index: []

why is this returning an empty dataframe?

Upvotes: 3

Views: 3476

Answers (2)

piRSquared
piRSquared

Reputation: 294288

Option 1
np.in1d
Indiscriminate check if a df2.code for a particular row is anywhere in df1

df2[np.in1d(df2.code, df1[['code1', 'code2', 'code3']])]

   name rate location   code
1  Sara  3.5       OH  6543y

Speed it up

df2[np.in1d(df2.code.values, df1[['code1', 'code2', 'code3']].values)]

Option 2
Check to see if df2.loc[x, 'code'] is in df1.loc[x]
We use the pd.DataFrame.eq method so we can pass the axis=0 parameter which checks if a series is equal to each column. Basically, axis=0 means, align the index (not columns).

df2[df1[['code1', 'code2', 'code3']].eq(df2.code, 0).any(1)]

   name rate location   code
1  Sara  3.5       OH  6543y

Speed it up

df2[(df1[['code1', 'code2', 'code3']].values == df2['code'].values[:, None]).any(1)]

Upvotes: 3

Zero
Zero

Reputation: 76927

Use, filter and stack

In [4647]: df2[df2['code'].isin(df1.filter(like='code').stack())]
Out[4647]:
   name rate location   code
1  Sara  3.5       OH  6543y

Or, if you're use about code* columns to be after 1:

In [4648]: df2[df2['code'].isin(df1[df1.columns[1:]].stack())]
Out[4648]:
   name rate location   code
1  Sara  3.5       OH  6543y

Details

In [4649]: df1.filter(like='code')
Out[4649]:
   code1  code2  code3
0  1234x  2345x  3456x
1  4321y  5432y  6543y

In [4650]: df1.filter(like='code').stack()
Out[4650]:
0  code1    1234x
   code2    2345x
   code3    3456x
1  code1    4321y
   code2    5432y
   code3    6543y
dtype: object

In [4651]: df2['code'].isin(df1.filter(like='code').stack())
Out[4651]:
0    False
1     True
Name: code, dtype: bool

Upvotes: 2

Related Questions