Pandas isin() not working properly with numerical values

Question

I have a pandas dataframe where one column is all float, another column either contains list of floats, None, or just float values. I have ensured all values are floats.

Ultimately, I want to use pd.isin() to check how many records of value_1 are in value_2 but it is not working for me. When I ran this code below:

df[~df['value_1'].isin(df['value_2'])]

This below is what it returned which is not expected since clearly some values in value_1 are in the value_2 lists.:

0     88870.0    [88870.0]  
1.    150700.0    None
2     225000.0   [225000.0, 225000.0]
3.    305000.0   [305606.0, 305000.0, 1067.5]
4     392000.0   [392000.0] 
5     198400.0    396

What am I missing? Please help.

mozway · Accepted Answer

You can use boolean indexing with numpy.isin in a list comprehension:

import numpy as np

out = df[[bool(np.isin(v1, v2)) for v1, v2 in zip(df['value_1'], df['value_2'])]]

Output:

    value_1                       value_2
0   88870.0                     [88870.0]
2  225000.0          [225000.0, 225000.0]
3  305000.0  [305606.0, 305000.0, 1067.5]
4  392000.0                    [392000.0]

Pandas isin() not working properly with numerical values

Answers (2)

Related Questions