Reputation: 490

Check if any value ( multiple columns) of one data frame exists in any values (multiple columns) of another data frame

I have two data frames with similar data in different formats

df1:

Nodo      X          Y          Z
CTB3901   CTBX3901   CTBY3901   CTBZ3901
MTR5331   MTRX5331   MTRY5331   MTRZ5331
ADC3451   ADCX3451   ADCY3451   ADCZ3451

df2:

Site_x     Site_y
CTBX3901E  CTBX3901
CTB3901    CTB3901E
CTBZ3901E  CTBZ3901
CTBY3901E  CTB3901
MADX6379E  MADX6379

I want to check if any entries from df2['Site_x', 'Site_y'] is in any of the columns df1['Nodo','X','Y','Z']. The data need not be in the same row in both frames.

The final output after the check shud be as below

Site_x     Site_y   Checked
CTBX3901E  CTBX3901  True
CTB3901    CTB3901E  True
CTBZ3901E  CTBZ3901  True
CTBY3901E  CTB3901   True
MADX6379E  MADX6379  False

Pardon me for the clumsy dataset. In desperation of getting this part right, I had to paste the same data I was working with.

I've tried isin method with the below syntax but the output has False in the entire 'Checked' column.

df2['Checked'] = df2[['Site_x','Site_y']].isin(df1[['Nodo','X','Y','Z']]).any(axis=1)

Upvotes: 1

Answers (4)

piRSquared

Reputation: 294536

Fun with Numpy

Or as I call it, Funpy

a = df2.to_numpy()[..., None, None]
b = df1.to_numpy()[None, None, ...]
df2.assign(Checked=(a == b).any(axis=(1, 2, 3)))

      Site_x    Site_y  Checked
0  CTBX3901E  CTBX3901     True
1    CTB3901  CTB3901E     True
2  CTBZ3901E  CTBZ3901     True
3  CTBY3901E   CTB3901     True
4  MADX6379E  MADX6379    False

Upvotes: 1

Erfan

Reputation: 42946

Another method using also the np.ravel() same as anky_91's answer but then using str.contains which can be more general, but less easily expandable when having more columns:

df2['Checked'] = df2['Site_x'].str.cat(df2['Site_y']).str.contains('|'.join(df1.values.ravel()))

      Site_x    Site_y  Checked
0  CTBX3901E  CTBX3901     True
1    CTB3901  CTB3901E     True
2  CTBZ3901E  CTBZ3901     True
3  CTBY3901E   CTB3901     True
4  MADX6379E  MADX6379    False

Upvotes: 1

anky

Reputation: 75150

Using np.ravel() with .isin():

df2['Checked']=df2.isin(df1.values.ravel()).any(axis=1)
print(df2)

      Site_x    Site_y  Checked
0  CTBX3901E  CTBX3901     True
1    CTB3901  CTB3901E     True
2  CTBZ3901E  CTBZ3901     True
3  CTBY3901E   CTB3901     True
4  MADX6379E  MADX6379    False

Upvotes: 3

Quang Hoang

Reputation: 150815

You are pretty close:

df2['checked'] = df2.apply(lambda x: x.isin(df1.stack())).any(axis=1)

Upvotes: 1

Check if any value ( multiple columns) of one data frame exists in any values (multiple columns) of another data frame

Answers (4)

Fun with Numpy

Related Questions