Reputation: 35
I have a dataframe like this:
Country Energy Supply Energy Supply per Capita
16 Afghanistan 3.210000e+08 10.0
17 Albania 1.020000e+08 35.0
18 Algeria 1.959000e+09 51.0
19 American Samoa NaN
40 Bolivia
(Plurinational State of) 3.360000e+08 32.0
... ... ... ...
213 Switzerland17 1.113000e+09 136.0
214 Syrian Arab Republic 5.420000e+08 28.0
215 Tajikistan 1.060000e+08 13.0
216 Thailand 5.336000e+09 79.0
228 Ukraine18 4.844000e+09 107.0
232 United States of
America20 9.083800e+10 286.0
I needed to replace name of all those countries which are having parenthesis or numbers in their name. For example: 'Bolivia (Plurinational State of)' should be 'Bolivia','Switzerland17' should be 'Switzerland' and 'United States of America20' should be 'United States of America'. I tried this using replace() and split() but nothing worked out for me.
Can somebody please help me with this.
Upvotes: 1
Views: 68
Reputation: 14181
df.Country = df.Country.str.extract(r"([^(\d]+)")
Country Energy Supply Energy Supply per Capita 16 Afghanistan 3.210000e+08 10.0 17 Albania 1.020000e+08 35.0 18 Algeria 1.959000e+09 51.0 19 American Samoa NaN NaN 40 Bolivia 3.360000e+08 32.0 213 Switzerland 1.113000e+09 136.0 214 Syrian Arab Republic 5.420000e+08 28.0 215 Tajikistan 1.060000e+08 13.0 216 Thailand 5.336000e+09 79.0 228 Ukraine 4.844000e+09 107.0
Upvotes: 1
Reputation: 34086
You can use multiple regex
with str.replace
like this:
Consider below dataframe:
In [1431]: df
Out[1431]:
Country
0 Afghanistan
1 Bolivia (Plurinational State of)
2 Switzerland17
In [1433]: df['Country'] = df['Country'].str.replace(r"\(.*\)|\d+",'')
In [1434]: df
Out[1434]:
Country
0 Afghanistan
1 Bolivia
2 Switzerland
Upvotes: 2
Reputation: 150785
You can use this regex pattern with str.extract
:
df['Country'] = df.Country.str.extract('^([^\d\(]*)')[0]
Output:
Country Energy Supply Energy Supply per Capita
16 Afghanistan 3.210000e+08 10.0
17 Albania 1.020000e+08 35.0
18 Algeria 1.959000e+09 51.0
19 American Samoa NaN NaN
40 Bolivia 3.360000e+08 32.0
213 Switzerland 1.113000e+09 136.0
214 Syrian Arab Republic 5.420000e+08 28.0
215 Tajikistan 1.060000e+08 13.0
216 Thailand 5.336000e+09 79.0
228 Ukraine 4.844000e+09 107.0
232 United States of America 9.083800e+10 286.0
Upvotes: 1