How to test whether a pandas series contains elements from another list (or NumPy array or pandas series)?

Question

Assume that I have this DataFrame (Animals column is of type pandas.Series):

ID	Animals
1	[cat, dog, chicken]
2	[penguin]

And these lists (It can be NumPy Array or Pandas Series if it is better for performance):

mammals = ['cat', 'dog', 'cow', 'sheep']
birds = ['chicken', 'duck', 'penguin']

What I am trying to do is to add two columns to my DataFrame which are ContainsBirds and ContainsMammals based on the contents of the Animals column.

Here is the final expected output:

ID	Animals	ContainsBirds	ContainsMammals
1	[cat, dog, chicken]	1.0	1.0
2	[penguin]	1.0	0.0

jezrael · Accepted Answer

You can create dictionary for test if match at least one value by converting to sets with isdisjoint and if necessary 0.0 and 1.0 casting boolean to floats, for 0, 1 use .astype(int):

d = {'Birds':birds, 'Mammals':mammals}

for k, v in d.items():
    df[f'Contains{k}'] = (~df['Animals'].map(set(v).isdisjoint)).astype(float)
print (df)
   ID              Animals  ContainsBirds  ContainsMammals
0   1  [cat, dog, chicken]            1.0              1.0
1   2            [penguin]            1.0              0.0

How to test whether a pandas series contains elements from another list (or NumPy array or pandas series)?

Answers (2)

Related Questions