Reputation: 171
I am relatively new to regex and I am trying to replace part of the string inside of the string column in Pandas DataFrame. The challenge is that I have multiple string types that I want to remove from my column while keeping the rest of the string.
I have code working for 1 type of string, but when I try to use for loop, the code is not working. I am not sure how to specify iterator inside of the regex expression.
Here is code that works when applied to 1 type of sub-string:
df = pd.DataFrame({'A': ['ba ca t', 'foo', 'bait'],'B': ['abc', 'bar', 'xyz']})
df
df=df.replace({'A': r'^ba ca'}, {'A': ''}, regex=True)
df
Here is code that is not working when I try to us For Loop:
df = pd.DataFrame({'A': ['ba ca t', 'foo', 'bait'],'B': ['abc', 'bar', 'xyz']})
replace_list=['ba ca','foo']
for i in replace_list:
df=df.replace({'A': r'^(i)'}, {'A': ''}, regex=True)
df
I would like to iterate over a list of strings to remove them from a column in the DataFrame.
Upvotes: 0
Views: 830
Reputation: 402443
'^(i)'
is not the correct method of performing string interpolation. You're looking for something along the lines of f-string formatting (rf'^{i}'
) or str.format
(r'^{}'.format(i)
).
Although a better solution here would be to ditch the loop, since replace
allows you to perform multiple replacements at once.
df.replace({'A': replace_list}, '', regex=True)
A B
0 t abc
1 bar
2 bait xyz
Or, with str.replace
:
df['A'].str.replace('|'.join(replace_list), '')
0 t
1
2 bait
Name: A, dtype: object
This post by me should also be worth a read: What is the difference between Series.replace and Series.str.replace?
Upvotes: 3
Reputation: 4482
Since you wan't i
to modify your regex pattern, you should consider this change:
df=df.replace({'A': r'^({})'.format(i)}, {'A': ''}, regex=True)
Output
+----+-------+-----+
| | A | B |
+----+-------+-----+
| 0 | t | abc |
| 1 | | bar |
| 2 | bait | xyz |
+----+-------+-----+
Upvotes: 2