filter a dataframe with vectorization

Question

I have the followings:

df = pd.DataFrame({"value":['A','B','B','C','B'],'my_list':[['J1','J4'],['J2','J9','J1'],['J0','J9','J2'],['J2'],['V13','X9','J1']]})

    value   my_list
0   A   [J1, J4]
1   B   [J2, J9, J1]
2   B   [J0, J9, J2]
3   C   [J2]
4   B   [V13, X9, J1]

and I wish to find all values B that does not have J2 in it. If using lambda I can do:

df.apply(lambda x: x['value']=='B' and 'J2' not in x['my_list'], axis=1)


0    False
1    False
2    False
3    False
4     True

But I wish to do something like this:

df[(df['value']=='B') & ('J2' not in df['my_list'])]

doable?

anky · Accepted Answer

Using list, dict etc in a pandas dataframe loses the befinifit of vectorization. If storing these values with these dtypes are mandatory, list comprehension works faster:

df['value'].eq("B")&['J2' not in i for i in df['my_list']]

Other methods:

Converting to dataframe looks like:

df['value'].eq('B') & ~pd.DataFrame(df['my_list'].tolist()).isin(['J2']).any(1)

Or converting to string :

df['value'].eq('B')& ~df['my_list'].astype(str).str.contains("J2")

And:

df['value'].eq("B")&df['my_list'].astype(str).str.count("J2").ne(1)

filter a dataframe with vectorization

Answers (1)

Related Questions