Reputation: 67
Let's say I want to replace the whole strings that contain "arm","hay" and "Arm" character with a string called "Armenian" simultaneously. (For example: armenia-> Armenian, hayeren->Armenian etc.)
Here is what I tired
> df[col] = df[col].apply(lambda x : 'Armenian' if ["Arm","hay","arm",] in x else x)
And I get
TypeError: 'in ' requires string as left operand, not list>
Upvotes: 1
Views: 898
Reputation: 8012
The str
objects support the in
operator for strings only. Ex.
>>> 'arm' in 'I broke my arm'
True
so you get an error when you try to use it with a list
>>> ['Arm', 'arm'] in 'I broke my arm'
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'in <string>' requires string as left operand, not list
If you want to see if a string contains any term in a list, I suggest you to use the re
module, to build a regular expression and match it against the string:
>>> import re
>>> regex = re.compile('|'.join('(%s)' % term for term in terms))
>>> regex.search('I broke my arm')
<_sre.SRE_Match object; span=(11, 14), match='arm'>
You can build a function from that:
>>> def replace_from_list(terms, substitute, flags=0):
... regex = re.compile('|'.join('(%s)' % term for term in terms))
... def inner_replace(s):
... return substitute if regex.search(s, flags) else s
... return inner_replace
...
>>> f = replace_from_list(['arm', 'hay'], 'Armenian', re.IGNORECASE)
>>> f('I broke my arm')
'Armenian'
>>> f('I broke my leg')
'I broke my leg'
>>>
Note that I do not know Pandas, but it seems it has some facility to do exactly what you need. See JonClements answer.
Upvotes: 0
Reputation: 1032
import numpy as np
df[col] = np.where(df[col].str.contains(("Arm","hay","arm")), "Armenian", df[col])
This will do the job
Upvotes: 2
Reputation: 2083
I tried this:
df['col'] = list(map(lambda x : 'Armenian' if any(item in x for item in ["Arm","hay","arm"]) else x, df['col']))
Upvotes: -1
Reputation: 142206
You can use:
df.loc[df['col'].str.contains('(?i)hay|arm'), 'col'] = 'Armenian'
This checks if the column contains "hay" or "arm" case insensitively anywhere in the column, and returns a boolean array used to filter rows from the original dataframe and assigns to the column "Armenian" where a match was found.
Upvotes: 3