Reputation: 57
I'm really struggling with a simple string find and replace inside a Pandas dataframe column.
As a simple example, where I find " (C)" as part of a column value, I would want to replace this with "".
Here's some really simple code, which I can't get working using pd.str.replace(), which according to the first answer to this post (Python Pandas: How to replace a characters in a column of a dataframe?) I think should work (but guess that the space and brackets might be confusing things?). I've tried some regex's but must clearly be doing them wrong.
data = {'id': [1, 2, 3, 4], 'name': ['name1 (C)', 'name2 (B)', 'name3', 'name4']}
df_data = pd.DataFrame.from_dict(data)
df_data['name'] = df_data['name'].str.replace(' (C)', '')
print(df_data)
df_data['name'].replace({' (C)': ''}, inplace=True, regex=True)
print(df_data)
df_data['name'].replace({'( (C))': ''}, inplace=True, regex=True)
print(df_data)
Which yields the results:
id name
0 1 name1 (C)
1 2 name2 (B)
2 3 name3
3 4 name4
id name
0 1 name1 (C)
1 2 name2 (B)
2 3 name3
3 4 name4
id name
0 1 name1 (C)
1 2 name2 (B)
2 3 name3
3 4 name4
What is however really confusing is if I run this into just a plain string variable, everything works perfectly using the replace function.
mystr = "name (C)"
mystr.replace(" (C)", "")
Out[23]: 'name'
Any help would be greatly appreciated!!
Upvotes: 1
Views: 181
Reputation: 862841
Escape ()
first, because special regex chars:
df_data['name'] = df_data['name'].str.replace(' \(C\)', '')
Or:
df_data['name'] = df_data['name'].replace(' \(C\)', '', regex=True)
print(df_data)
id name
0 1 name1
1 2 name2 (B)
2 3 name3
3 4 name4
Upvotes: 2
Reputation: 13255
Use escape for special characters while using regex
:
df_data['name'].str.replace(' \(C\)','')
Upvotes: 1