Kshitij Agrawal
Kshitij Agrawal

Reputation: 233

Replace a string from python df issue

I am trying to separate few string from pandas dataframe :

x = pd.DataFrame()
x['y'] = ["Hernia|Infiltration","A|Hernia|Infiltration","Infiltration|Hernia"]
x

I am executing below code :

x['y'] = x['y'].replace({'|Hernia': ''},regex=True)
x['y'] = x['y'].str.replace('Hernia|', '',regex=True)
x

But output is wrong :

wrong output :

     y
0   |Infiltration
1   A||Infiltration
2   Infiltration|

Correct/ Expected output

     y
0   Infiltration
1   A|Infiltration
2   Infiltration

There can be any string in place of A and Infiltration , but pattern would be same.

Upvotes: 1

Views: 51

Answers (2)

Quang Hoang
Quang Hoang

Reputation: 150735

You need to escape | in replace:

x['y'] = x['y'].replace({'\|Hernia': ''},regex=True)
x['y'] = x['y'].replace({'Hernia\|': ''},regex=True)

Taking from @user3483203 and @piRSquared's comments, you can join them with | acting as an or:

x['y'].replace({'\|Hernia|Hernia\|': '',
                '':''},regex=True, inplace=True)

Upvotes: 3

Chris
Chris

Reputation: 16137

This can probably be more elegantly handled with split/join

x['y'].apply(lambda row: '|'.join(x for x in row.split('|') if 'Hernia'!= x))

Output:

0      Infiltration
1    A|Infiltration
2      Infiltration

Upvotes: 3

Related Questions