Ambitions
Ambitions

Reputation: 2581

How to test whether a pandas series contains elements from another list (or NumPy array or pandas series)?

Assume that I have this DataFrame (Animals column is of type pandas.Series):

ID Animals
1 [cat, dog, chicken]
2 [penguin]

And these lists (It can be NumPy Array or Pandas Series if it is better for performance):

mammals = ['cat', 'dog', 'cow', 'sheep']
birds = ['chicken', 'duck', 'penguin']

What I am trying to do is to add two columns to my DataFrame which are ContainsBirds and ContainsMammals based on the contents of the Animals column.

Here is the final expected output:

ID Animals ContainsBirds ContainsMammals
1 [cat, dog, chicken] 1.0 1.0
2 [penguin] 1.0 0.0

Upvotes: 2

Views: 67

Answers (2)

mozway
mozway

Reputation: 260420

Using a list comprehension:

lists = [birds, mammals]
names = ['Birds', 'Mammals']

df[names] = [[int(bool(set(l).intersection(x))) for l in lists]
             for x in df['Animals']]

output:

   ID              Animals  Birds  Mammals
0   1  [cat, dog, chicken]      1        1
1   2            [penguin]      1        0

Upvotes: 1

jezrael
jezrael

Reputation: 862511

You can create dictionary for test if match at least one value by converting to sets with isdisjoint and if necessary 0.0 and 1.0 casting boolean to floats, for 0, 1 use .astype(int):

d = {'Birds':birds, 'Mammals':mammals}

for k, v in d.items():
    df[f'Contains{k}'] = (~df['Animals'].map(set(v).isdisjoint)).astype(float)
print (df)
   ID              Animals  ContainsBirds  ContainsMammals
0   1  [cat, dog, chicken]            1.0              1.0
1   2            [penguin]            1.0              0.0

Upvotes: 1

Related Questions