Reputation: 67

Replace multiple strings simultaneously if it contains substring

Let's say I want to replace the whole strings that contain "arm","hay" and "Arm" character with a string called "Armenian" simultaneously. (For example: armenia-> Armenian, hayeren->Armenian etc.)

Here is what I tired

> df[col] = df[col].apply(lambda x : 'Armenian' if ["Arm","hay","arm",] in x else x)

And I get

TypeError: 'in ' requires string as left operand, not list>

Upvotes: 1

Answers (4)

mg.

Reputation: 8012

The str objects support the in operator for strings only. Ex.

>>> 'arm' in 'I broke my arm'
True

so you get an error when you try to use it with a list

>>> ['Arm', 'arm'] in 'I broke my arm'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'in <string>' requires string as left operand, not list

If you want to see if a string contains any term in a list, I suggest you to use the re module, to build a regular expression and match it against the string:

>>> import re
>>> regex = re.compile('|'.join('(%s)' % term for term in terms))
>>> regex.search('I broke my arm')
<_sre.SRE_Match object; span=(11, 14), match='arm'>

You can build a function from that:

>>> def replace_from_list(terms, substitute, flags=0):
...     regex = re.compile('|'.join('(%s)' % term for term in terms))
...     def inner_replace(s):
...         return substitute if regex.search(s, flags) else s
...     return inner_replace
... 
>>> f = replace_from_list(['arm', 'hay'], 'Armenian', re.IGNORECASE)
>>> f('I broke my arm')
'Armenian'
>>> f('I broke my leg')
'I broke my leg'
>>>

Note that I do not know Pandas, but it seems it has some facility to do exactly what you need. See JonClements answer.

Upvotes: 0

Bartek Malysz

Reputation: 1032

import numpy as np
df[col] = np.where(df[col].str.contains(("Arm","hay","arm")), "Armenian", df[col])

This will do the job

Upvotes: 2

mastisa

Reputation: 2083

I tried this:

df['col'] = list(map(lambda x : 'Armenian' if any(item in x for item in ["Arm","hay","arm"]) else x, df['col']))

Upvotes: -1

Jon Clements

Reputation: 142206

You can use:

df.loc[df['col'].str.contains('(?i)hay|arm'), 'col'] = 'Armenian'

This checks if the column contains "hay" or "arm" case insensitively anywhere in the column, and returns a boolean array used to filter rows from the original dataframe and assigns to the column "Armenian" where a match was found.

Upvotes: 3

Replace multiple strings simultaneously if it contains substring

Answers (4)

Related Questions