user3062260
user3062260

Reputation: 1644

check if values of a series appear in same row of a different df

I'm trying to find out if the values of a series appear in another df. Here is a simple example:

import pandas as pd

s=pd.Series([1.1,2.2])
df=pd.DataFrame()
df['a']=[1.1,3.3]
df['b']=[4.4,5.5]

I want to check if '1.1' is present in df in any column and if it is then get a value True. I was this as a boolean mask so that i can use for a selection criteria on another df.

I have tried:

>>> s.isin(df)             
0    False
1    False

also:

>>> s.values == df   
       a      b
0   True  False
1  False  False

looks like this is on the right track so i try and derive a mask from it:

>>> np.sum(s.values == df, axis=1) >0
0     True
1    False
dtype: bool

looks good, but when I try this on my actual data i get an unusual error:

I am just trying to do the following 3 prints statements and i get:

print(df)
print(series)
print(np.sum([series.values==df], axis=1) >0)

                  3rd        2nd        1st
Date
1995-01-03        NaN        NaN        NaN
1995-01-04        NaN        NaN        NaN
1995-01-05        NaN        NaN        NaN
1995-01-06        NaN        NaN        NaN
1995-01-09        NaN        NaN        NaN
...               ...        ...        ...
2021-05-19  19.531345  16.888084  15.596825
2021-05-20  19.422386  16.571087  14.174667
2021-05-21  19.283871  16.112516  14.281031
2021-05-24  19.167412  15.726680  14.438169
2021-05-25  19.157773  15.488945  14.616952

[6815 rows x 3 columns]
Date
1995-01-03          NaN
1995-01-04          NaN
1995-01-05          NaN
1995-01-06          NaN
1995-01-09          NaN
                ...
2021-05-19    14.016911
2021-05-20    14.174667
2021-05-21    14.281031
2021-05-24    14.438169
2021-05-25    14.616952
Name: series_name, Length: 6815, dtype: float64
CRITICAL:root:Global try/catch caught an unhandled exception
ERROR:root:Unable to coerce to Series, length must be 3: given 6815
ValueError: Unable to coerce to Series, length must be 3: given 6815

This sizes of the series and dataframe and series seem to be correct and the final 4 values in the series (ending in 14.616952) matches the dataframe final column so I would expect a mask to have a 'True' value at this point?

If anyone can diagnose this bug it would be really helpful? Many thanks

Upvotes: 0

Views: 600

Answers (2)

RavinderSingh13
RavinderSingh13

Reputation: 133518

Use .isin method of Pandas like following. Simple explanation would be, apply isin function on DataFrame with respect to series and then apply any function on axis=1 to make sure get expected output in form of True or False for whole row.

df.isin(series).any(axis=1)

Output will be as follows:

0     True
1    False
dtype: bool

Upvotes: 2

Quang Hoang
Quang Hoang

Reputation: 150745

You have an extra [] around your command:

print(df)
print(series)
print(np.sum([series.values==df], axis=1) >0)
             ^ remove this     ^ and this

Also consider all:

(series.values==df).all(axis=1)

Upvotes: 0

Related Questions