Sveta
Sveta

Reputation: 171

Multiple regex replacements in a loop with pandas

I am relatively new to regex and I am trying to replace part of the string inside of the string column in Pandas DataFrame. The challenge is that I have multiple string types that I want to remove from my column while keeping the rest of the string.

I have code working for 1 type of string, but when I try to use for loop, the code is not working. I am not sure how to specify iterator inside of the regex expression.

Here is code that works when applied to 1 type of sub-string:

df = pd.DataFrame({'A': ['ba ca t', 'foo', 'bait'],'B': ['abc', 'bar', 'xyz']})
df
df=df.replace({'A': r'^ba ca'}, {'A': ''}, regex=True)
df

Here is code that is not working when I try to us For Loop:

df = pd.DataFrame({'A': ['ba ca t', 'foo', 'bait'],'B': ['abc', 'bar', 'xyz']})
replace_list=['ba ca','foo']
for i in replace_list:
    df=df.replace({'A': r'^(i)'}, {'A': ''}, regex=True)
df

I would like to iterate over a list of strings to remove them from a column in the DataFrame.

Upvotes: 0

Views: 830

Answers (2)

cs95
cs95

Reputation: 402443

'^(i)' is not the correct method of performing string interpolation. You're looking for something along the lines of f-string formatting (rf'^{i}') or str.format (r'^{}'.format(i)).

Although a better solution here would be to ditch the loop, since replace allows you to perform multiple replacements at once.

df.replace({'A': replace_list}, '', regex=True)

      A    B
0     t  abc
1        bar
2  bait  xyz

Or, with str.replace:

df['A'].str.replace('|'.join(replace_list), '')

0       t
1        
2    bait
Name: A, dtype: object

This post by me should also be worth a read: What is the difference between Series.replace and Series.str.replace?

Upvotes: 3

Sebastien D
Sebastien D

Reputation: 4482

Since you wan't i to modify your regex pattern, you should consider this change:

 df=df.replace({'A': r'^({})'.format(i)}, {'A': ''}, regex=True)

Output

+----+-------+-----+
|    |  A    |  B  |
+----+-------+-----+
| 0  | t     | abc |
| 1  |       | bar |
| 2  | bait  | xyz |
+----+-------+-----+

Upvotes: 2

Related Questions