Reputation: 490
I have two data frames with similar data in different formats
df1:
Nodo X Y Z
CTB3901 CTBX3901 CTBY3901 CTBZ3901
MTR5331 MTRX5331 MTRY5331 MTRZ5331
ADC3451 ADCX3451 ADCY3451 ADCZ3451
df2:
Site_x Site_y
CTBX3901E CTBX3901
CTB3901 CTB3901E
CTBZ3901E CTBZ3901
CTBY3901E CTB3901
MADX6379E MADX6379
I want to check if any entries from df2['Site_x', 'Site_y']
is in any of the columns df1['Nodo','X','Y','Z']
. The data need not be in the same row in both frames.
The final output after the check shud be as below
Site_x Site_y Checked
CTBX3901E CTBX3901 True
CTB3901 CTB3901E True
CTBZ3901E CTBZ3901 True
CTBY3901E CTB3901 True
MADX6379E MADX6379 False
Pardon me for the clumsy dataset. In desperation of getting this part right, I had to paste the same data I was working with.
I've tried isin method with the below syntax but the output has False in the entire 'Checked' column.
df2['Checked'] = df2[['Site_x','Site_y']].isin(df1[['Nodo','X','Y','Z']]).any(axis=1)
Upvotes: 1
Views: 1056
Reputation: 294258
Or as I call it, Funpy
a = df2.to_numpy()[..., None, None]
b = df1.to_numpy()[None, None, ...]
df2.assign(Checked=(a == b).any(axis=(1, 2, 3)))
Site_x Site_y Checked
0 CTBX3901E CTBX3901 True
1 CTB3901 CTB3901E True
2 CTBZ3901E CTBZ3901 True
3 CTBY3901E CTB3901 True
4 MADX6379E MADX6379 False
Upvotes: 1
Reputation: 42916
Another method using also the np.ravel()
same as anky_91's answer but then using str.contains
which can be more general, but less easily expandable when having more columns:
df2['Checked'] = df2['Site_x'].str.cat(df2['Site_y']).str.contains('|'.join(df1.values.ravel()))
Site_x Site_y Checked
0 CTBX3901E CTBX3901 True
1 CTB3901 CTB3901E True
2 CTBZ3901E CTBZ3901 True
3 CTBY3901E CTB3901 True
4 MADX6379E MADX6379 False
Upvotes: 1
Reputation: 75080
Using np.ravel()
with .isin()
:
df2['Checked']=df2.isin(df1.values.ravel()).any(axis=1)
print(df2)
Site_x Site_y Checked
0 CTBX3901E CTBX3901 True
1 CTB3901 CTB3901E True
2 CTBZ3901E CTBZ3901 True
3 CTBY3901E CTB3901 True
4 MADX6379E MADX6379 False
Upvotes: 3
Reputation: 150735
You are pretty close:
df2['checked'] = df2.apply(lambda x: x.isin(df1.stack())).any(axis=1)
Upvotes: 1