john doe
john doe

Reputation: 2253

How to compute the indexed intersection between two similar pandas columns?

I have the following list:

new_pets = ['Bobcats', 'dog', 'cat', 'turtle', 'monkey', 'goat', 'ferret', 'pig', 'Chipmunks', 'Capybaras', 'Ducks']

And the following pandas dataframe:

In: df

0   Cats
1   Lizard
2   Dog
3   Baby Ferrets
4   Pig
5   Armadillo

How can I get into a new column the elements of new_pets which appear in df (*)?:

In: df['new_col']

0   True
1   False
2   True
3   True
4   True
5   False 

From the docs, I noticed that this can be done with contains(), so I tried the following:

result = df[df['pets'].str.contains(x, case = False) for x in new_pets]

However, I am not sure if this is possible. For instance, is it possible to match Baby Ferrets with ferret, since Ferrets is similar to ferret?. For that constraint, I tried to use case=False, but I am not getting the expected outcome (*). Any idea of how to retrive such strings in a new dataframe?.

Upvotes: 1

Views: 46

Answers (1)

jezrael
jezrael

Reputation: 863431

You can first join values by | (regex or is |) and convert all values to lower-case by lower - output is in joined. Then lowercase all values in column by str.lower and call str.contains with joined for checking if bobcats or dog or dog ... is in column:

print (df)
           pets
0          Cats
1        Lizard
2           Dog
3  Baby Ferrets
4           Pig
5     Armadillo

joined = '|'.join(new_pets).lower()

df['new_col'] = df['pets'].str.lower().str.contains(joined)
print (df)
              a new_col
0          Cats    True
1        Lizard   False
2           Dog    True
3  Baby Ferrets    True
4           Pig    True
5     Armadillo   False

Upvotes: 2

Related Questions