Aniruddh Maini
Aniruddh Maini

Reputation: 35

How to replace those entries in a column which have parenthesis or numbers included with them in a pandas dataframe?

I have a dataframe like this:         
    Country                  Energy Supply      Energy Supply per Capita
16  Afghanistan              3.210000e+08       10.0    
17  Albania                  1.020000e+08       35.0    
18  Algeria                  1.959000e+09       51.0    
19  American Samoa           NaN                                        
40  Bolivia 
   (Plurinational State of)  3.360000e+08       32.0
... ... ... ...
213 Switzerland17            1.113000e+09       136.0   
214 Syrian Arab Republic     5.420000e+08       28.0    
215 Tajikistan               1.060000e+08       13.0    
216 Thailand                 5.336000e+09       79.0    
228 Ukraine18                4.844000e+09       107.0   
232 United States of 
    America20                9.083800e+10       286.0

I needed to replace name of all those countries which are having parenthesis or numbers in their name. For example: 'Bolivia (Plurinational State of)' should be 'Bolivia','Switzerland17' should be 'Switzerland' and 'United States of America20' should be 'United States of America'. I tried this using replace() and split() but nothing worked out for me.

Can somebody please help me with this.

Upvotes: 1

Views: 68

Answers (3)

MarianD
MarianD

Reputation: 14181

df.Country = df.Country.str.extract(r"([^(\d]+)")
      Country              Energy Supply     Energy Supply per Capita
16   Afghanistan           3.210000e+08     10.0
17   Albania               1.020000e+08     35.0
18   Algeria               1.959000e+09     51.0
19   American Samoa                 NaN     NaN
40   Bolivia               3.360000e+08     32.0
213  Switzerland           1.113000e+09     136.0
214  Syrian Arab Republic  5.420000e+08     28.0
215  Tajikistan            1.060000e+08     13.0
216  Thailand              5.336000e+09     79.0
228  Ukraine               4.844000e+09     107.0

Upvotes: 1

Mayank Porwal
Mayank Porwal

Reputation: 34086

You can use multiple regex with str.replace like this:

Consider below dataframe:

In [1431]: df 
Out[1431]: 
                            Country
0                       Afghanistan
1  Bolivia (Plurinational State of)
2                     Switzerland17

In [1433]: df['Country'] = df['Country'].str.replace(r"\(.*\)|\d+",'')
In [1434]: df  
Out[1434]: 
         Country
0    Afghanistan
1       Bolivia 
2    Switzerland

Upvotes: 2

Quang Hoang
Quang Hoang

Reputation: 150785

You can use this regex pattern with str.extract:

df['Country'] = df.Country.str.extract('^([^\d\(]*)')[0]

Output:

                      Country  Energy Supply  Energy Supply per Capita
16                Afghanistan   3.210000e+08                      10.0
17                    Albania   1.020000e+08                      35.0
18                    Algeria   1.959000e+09                      51.0
19             American Samoa            NaN                       NaN
40                   Bolivia    3.360000e+08                      32.0
213               Switzerland   1.113000e+09                     136.0
214      Syrian Arab Republic   5.420000e+08                      28.0
215                Tajikistan   1.060000e+08                      13.0
216                  Thailand   5.336000e+09                      79.0
228                   Ukraine   4.844000e+09                     107.0
232  United States of America   9.083800e+10                     286.0

Upvotes: 1

Related Questions