Reputation: 521
I have a dataframe where column Name
contains values such as the following (the rest of the columns do not affect how this question is answered I hope):
Chicken
Chickens
Fluffy Chicken
Whale
Whales
Blue Whale
Tiger
White Tiger
Big Tiger
Now, I want to ensure that we rename these entries to be like the following:
Chicken
Chicken
Chicken
Whale
Whale
Whale
Tiger
Tiger
Tiger
Essentially substituting anything that has 'Chicken' to just be 'Chicken, anything with 'Whale' to be just 'Whale, and anything with 'Tiger' to be just 'Tiger'.
What is the best way to do this? There are almost 1 million rows in the dataframe.
Sorry just to add, I have a list of what we expect i.e.
['Chicken', 'Whale', 'Tiger']
They can appear in any order in the column
What I should also add is, the column might contain things like "Mushroom" or "Eggs" that do not need substituting from the original list.
Upvotes: 0
Views: 433
Reputation: 323226
Try with str.extract
#l = ['Chicken', 'Whale', 'Tiger']
df['new'] = df['col'].str.extract('('+'|'.join(l)+')')[0]
Out[10]:
0 Chicken
1 Chicken
2 Chicken
3 Whale
4 Whale
5 Whale
6 Tiger
7 Tiger
8 Tiger
Name: 0, dtype: object
Upvotes: 2