Reputation: 1644
I'm trying to find out if the values of a series appear in another df. Here is a simple example:
import pandas as pd
s=pd.Series([1.1,2.2])
df=pd.DataFrame()
df['a']=[1.1,3.3]
df['b']=[4.4,5.5]
I want to check if '1.1' is present in df in any column and if it is then get a value True. I was this as a boolean mask so that i can use for a selection criteria on another df.
I have tried:
>>> s.isin(df)
0 False
1 False
also:
>>> s.values == df
a b
0 True False
1 False False
looks like this is on the right track so i try and derive a mask from it:
>>> np.sum(s.values == df, axis=1) >0
0 True
1 False
dtype: bool
looks good, but when I try this on my actual data i get an unusual error:
I am just trying to do the following 3 prints statements and i get:
print(df)
print(series)
print(np.sum([series.values==df], axis=1) >0)
3rd 2nd 1st
Date
1995-01-03 NaN NaN NaN
1995-01-04 NaN NaN NaN
1995-01-05 NaN NaN NaN
1995-01-06 NaN NaN NaN
1995-01-09 NaN NaN NaN
... ... ... ...
2021-05-19 19.531345 16.888084 15.596825
2021-05-20 19.422386 16.571087 14.174667
2021-05-21 19.283871 16.112516 14.281031
2021-05-24 19.167412 15.726680 14.438169
2021-05-25 19.157773 15.488945 14.616952
[6815 rows x 3 columns]
Date
1995-01-03 NaN
1995-01-04 NaN
1995-01-05 NaN
1995-01-06 NaN
1995-01-09 NaN
...
2021-05-19 14.016911
2021-05-20 14.174667
2021-05-21 14.281031
2021-05-24 14.438169
2021-05-25 14.616952
Name: series_name, Length: 6815, dtype: float64
CRITICAL:root:Global try/catch caught an unhandled exception
ERROR:root:Unable to coerce to Series, length must be 3: given 6815
ValueError: Unable to coerce to Series, length must be 3: given 6815
This sizes of the series and dataframe and series seem to be correct and the final 4 values in the series (ending in 14.616952) matches the dataframe final column so I would expect a mask to have a 'True' value at this point?
If anyone can diagnose this bug it would be really helpful? Many thanks
Upvotes: 0
Views: 600
Reputation: 133518
Use .isin
method of Pandas like following. Simple explanation would be, apply isin function on DataFrame with respect to series and then apply any
function on axis=1 to make sure get expected output in form of True or False for whole row.
df.isin(series).any(axis=1)
Output will be as follows:
0 True
1 False
dtype: bool
Upvotes: 2
Reputation: 150745
You have an extra []
around your command:
print(df)
print(series)
print(np.sum([series.values==df], axis=1) >0)
^ remove this ^ and this
Also consider all
:
(series.values==df).all(axis=1)
Upvotes: 0