Pandas dataframe replace substring in column giving unexpected result

Question

I'm really struggling with a simple string find and replace inside a Pandas dataframe column.

As a simple example, where I find " (C)" as part of a column value, I would want to replace this with "".

Here's some really simple code, which I can't get working using pd.str.replace(), which according to the first answer to this post (Python Pandas: How to replace a characters in a column of a dataframe?) I think should work (but guess that the space and brackets might be confusing things?). I've tried some regex's but must clearly be doing them wrong.

data = {'id': [1, 2, 3, 4], 'name': ['name1 (C)', 'name2 (B)', 'name3', 'name4']}
df_data = pd.DataFrame.from_dict(data)
df_data['name'] = df_data['name'].str.replace(' (C)', '')
print(df_data)
df_data['name'].replace({' (C)': ''}, inplace=True, regex=True)
print(df_data)
df_data['name'].replace({'( (C))': ''}, inplace=True, regex=True)
print(df_data)

Which yields the results:

   id       name
0   1  name1 (C)
1   2  name2 (B)
2   3      name3
3   4      name4
   id       name
0   1  name1 (C)
1   2  name2 (B)
2   3      name3
3   4      name4
   id       name
0   1  name1 (C)
1   2  name2 (B)
2   3      name3
3   4      name4

What is however really confusing is if I run this into just a plain string variable, everything works perfectly using the replace function.

mystr = "name (C)"
mystr.replace(" (C)", "")
Out[23]: 'name'

Any help would be greatly appreciated!!

jezrael · Accepted Answer

Escape () first, because special regex chars:

df_data['name'] = df_data['name'].str.replace(' $C$', '')

Or:

df_data['name'] = df_data['name'].replace(' $C$', '', regex=True)

print(df_data)
   id       name
0   1      name1
1   2  name2 (B)
2   3      name3
3   4      name4

Pandas dataframe replace substring in column giving unexpected result

Answers (2)

Related Questions