Pandas Dataframe - How to check if the string value in column A is available in the list of string items in column B

Question

Here is my dataframe which has two columns: Column A contains string and column B contains list of strings.

import pandas as pd

df = pd.DataFrame(columns=['A','B'])
df.loc[0] = ['apple',['orange','banana','blueberry']]
df.loc[1] = ['orange',['orange','banana','avocado']]
df.loc[2] = ['blueberry',['apple','banana','blueberry']]
df.loc[3] = ['cherry',['apple','orange','banana']]

print(df)

           A                            B
0      apple  [orange, banana, blueberry]
1     orange    [orange, banana, avocado]
2  blueberry   [apple, banana, blueberry]
3     cherry      [apple, orange, banana]

I want to check for each row to see if the value in column A is listed in the list in column B of the same row. So, the expected output should be:

0 False
1 True
2 True
3 False

I tried isin which works to check against a static list:

df.A.isin(['orange','banana','blueberry'])
0    False
1     True
2    False
3    False

However, when I try to use it to check the list items in the dataframe, it does not work:

df.A.isin(df.B)
TypeError: unhashable type: 'list'

I would like to avoid for loop and lambda if there is a solution available using Pandas.

Any help is greatly appreciated.

piRSquared · Accepted Answer

Fun with `sets`

df.A.apply(lambda x: set([x])) <= df.B.apply(set)

0    False
1     True
2     True
3    False
dtype: bool

No loops

But I'd still use the @jezrael's comprehension

pd.DataFrame(df.B.tolist(), df.index).eq(df.A, 0).any(1)

0    False
1     True
2     True
3    False
dtype: bool

Numpy broadcasting

Only works if each list in B is of the same length.

from numpy.core.defchararray import equal

pd.Series(
    equal(df.A.values.astype(str), np.array(df.B.tolist()).T).any(0),
    df.index
)

0    False
1     True
2     True
3    False
dtype: bool

`pd.get_dummies`

df.B.str.join('|').str.get_dummies().mul(pd.get_dummies(df.A)).any(1)

0    False
1     True
2     True
3    False
dtype: bool

`np.bincount`

I like this one (-:
However, jezrael notes poor performance )-: so beware.

i = np.arange(len(df)).repeat(df.B.str.len())
pd.Series(
    np.bincount(i, df.A.values[i] == np.concatenate(df.B)).astype(bool),
    df.index
)

0    False
1     True
2     True
3    False
dtype: bool

Pandas Dataframe - How to check if the string value in column A is available in the list of string items in column B

Answers (2)

Fun with `sets`

No loops

Numpy broadcasting

`pd.get_dummies`

`np.bincount`

Related Questions

Pandas Dataframe - How to check if the string value in column A is available in the list of string items in column B

Answers (2)

Fun with sets

No loops

Numpy broadcasting

pd.get_dummies

np.bincount

Related Questions

Fun with `sets`

`pd.get_dummies`

`np.bincount`