anarchy
anarchy

Reputation: 5184

How to use regex to modify a string in pandas in different cases

I have the following dataframe called df:

   Symbol  Country  Type  etc...
0  AG.L    UK       OS
1  UZ.     UK       OS
2  DT      UK       OS
3  XX.L    US       OS
4  MSFT    US       OS
5  AAPL    US       OS
6  DB.S    SG       OS

I want to perform the following on the frame. Where the Country == 'UK',

there can be 3 cases.

Case 1: ends with .L, do nothing Case 2: ends with ., add 'L' to the end Case3: ends with neither . or .L, add '.L' to the end As long as the Country == 'UK', I want it to end with a '.L'.

So it should look like this.

   Symbol  Country  Type  etc...
0  AG.L    UK       OS
1  UZ.L    UK       OS
2  DT.L    UK       OS
3  XX.L    US       OS
4  MSFT    US       OS
5  AAPL    US       OS
6  DB.S    SG       OS

I use the following code.

df.loc[df['Country'].eq('UK'),'Symbol'] = df.loc[df['Country'].eq('UK'),'Symbol'].str.replace(r'\.', '.L').str.replace(r'[a-z]$', '.L') 

but i get this

AG.LL  
UZ.L    
DT      

What's the right way to do it?

Upvotes: 3

Views: 69

Answers (1)

mac13k
mac13k

Reputation: 2663

You almost got it right, but you missed the dollar sign at the dot replacement and the other one has to be slightly different, so try:

df.loc[df['Country'].eq('UK'),'Symbol'] =  df.loc[df['Country'].eq('UK'),'Symbol'].str.replace(r'^([A-Z]+)$', r'\1.L').str.replace(r'\.$', '.L') 

In my Python shell it outputs:

0    AG.L
1    UZ.L
2    DT.L
Name: Symbol, dtype: object

Upvotes: 3

Related Questions