Reputation: 153
I have a dataframe, and a list of strings that I want to remove from a column in that dataframe. But when I use the replace function those characters remain. Can someone please explain why this is the case?
bad_chars = ['?', '!', ',', ';', "'", '|', '-', '--', '(', ')',
'[', ']', '{', '}', ':', '&', '\n']
and to replace:
df2['page'] = df2['page'].replace(bad_chars, '')
when i print out df2
:
for index, row in df2.iterrows():
print( row['project'] + '\t' + '(' + row['page'] + ',' + str(row['viewCount']) + ')' + '\n' )
en (The_Voice_(U.S._season_14),613)
Upvotes: 3
Views: 1308
Reputation: 617
Use .str.replace
, and pass your strings as a single, pipeline-separated string. You can use re.escape()
in order to escape regex characters from that string, as suggested by @jpp. I tweak his suggestion a bit by avoiding iteration:
import re
df2['page'] = df2['page'].str.replace(re.escape('|'.join(bad_chars)), '')
Upvotes: 0
Reputation: 164663
One way is to escape your characters using re
, then use pd.Series.str.replace
.
import pandas as pd
import re
bad_chars = ['?', '!', ',', ';', "'", '|', '-', '--', '(', ')',
'[', ']', '{', '}', ':', '&', '\n']
df = pd.DataFrame({'page': ['hello?', 'problems|here', 'nothingwronghere', 'nobrackets[]']})
df['page'] = df['page'].str.replace('|'.join([re.escape(s) for s in bad_chars]), '')
print(df)
# page
# 0 hello
# 1 problemshere
# 2 nothingwronghere
# 3 nobrackets
Upvotes: 3