Python - Replacing words from list in DataFrame with Regex pattern

Question

I have the following list and DataFrame:

mylist = ['foo', 'bar', 'baz']
df = pd.DataFrame({'Col1': ['fooThese', 'barWords', 'baz are', 'FOO: not', 'bAr:- needed'],
                   'Col2': ['Baz:Neither', 'Foo Are', 'barThese', np.nan, 'but this is fine']})

I want to replace the strings from mylist if found inside the DataFrame. I am able to replace some using the following Regex Pattern:

pat = '|'.join([r'\b{}'.format(w) for w in mylist])
df2 = df.replace(pat, '', regex=True)

However this doesn't place all the instances. My desired output is the following:

    Col1     Col2
0   These    Neither
1   Words    Are
2   are      These
3   not      NaN
4   needed   but this is fine

Erfan · Accepted Answer

You have to use the ?i regex flag which makes your replacements not case sensitive, also remove special characters:

mydict = {f'(?i){word}': '' for word in mylist}
df2 = df.replace(mydict, regex=True).replace('[:-]', '', regex=True)

      Col1              Col2
0    These           Neither
1    Words               Are
2      are             These
3      not               NaN
4   needed  but this is fine

Or you can add the special characters to your dictionary, so you don't have to call DataFrame.replace twice:

mydict = {f'(?i){word}': '' for word in mylist}#.update({'[:-]': ''})
mydict['[:-]'] = ''
df2 = df.replace(mydict, regex=True)

      Col1              Col2
0    These           Neither
1    Words               Are
2      are             These
3      not               NaN
4   needed  but this is fine

Python - Replacing words from list in DataFrame with Regex pattern

Answers (2)

An other solution

Using Pandas Serie `str.replace()` method

More help:

Related Questions

Python - Replacing words from list in DataFrame with Regex pattern

Answers (2)

An other solution

Using Pandas Serie str.replace() method

More help:

Related Questions

Using Pandas Serie `str.replace()` method