Rob
Rob

Reputation: 153

How to replace substrings in strings in pandas dataframe

I have a dataframe, and a list of strings that I want to remove from a column in that dataframe. But when I use the replace function those characters remain. Can someone please explain why this is the case?

bad_chars = ['?', '!', ',', ';', "'", '|', '-', '--', '(', ')', 
             '[', ']', '{', '}', ':', '&', '\n']

and to replace:

df2['page'] = df2['page'].replace(bad_chars, '')

when i print out df2:

for index, row in df2.iterrows():
    print( row['project'] + '\t' + '(' + row['page'] + ',' + str(row['viewCount']) + ')' + '\n'  )

en (The_Voice_(U.S._season_14),613)

Upvotes: 3

Views: 1308

Answers (2)

mcard
mcard

Reputation: 617

Use .str.replace, and pass your strings as a single, pipeline-separated string. You can use re.escape() in order to escape regex characters from that string, as suggested by @jpp. I tweak his suggestion a bit by avoiding iteration:

import re 
df2['page'] = df2['page'].str.replace(re.escape('|'.join(bad_chars)), '')

Upvotes: 0

jpp
jpp

Reputation: 164663

One way is to escape your characters using re, then use pd.Series.str.replace.

import pandas as pd
import re

bad_chars = ['?', '!', ',', ';', "'", '|', '-', '--', '(', ')', 
             '[', ']', '{', '}', ':', '&', '\n']

df = pd.DataFrame({'page': ['hello?', 'problems|here', 'nothingwronghere', 'nobrackets[]']})

df['page'] = df['page'].str.replace('|'.join([re.escape(s) for s in bad_chars]), '')

print(df)

#                page
# 0             hello
# 1      problemshere
# 2  nothingwronghere
# 3        nobrackets

Upvotes: 3

Related Questions