Reputation: 115
In my dataframe , There are several countries with numbers and/or parenthesis in their name. I want to remove parentheses and numbers from these countries names.
For example : 'Bolivia (Plurinational State of)' should be 'Bolivia', 'Switzerland17' should be 'Switzerland'.
Here is my code , but it seems not working :
import numpy as np
import pandas as pd
def func():
energy=pd.ExcelFile('Energy Indicators.xls').parse('Energy')
energy=energy.iloc[16:243][['Environmental Indicators: Energy','Unnamed: 3','Unnamed: 4','Unnamed: 5']].copy()
energy.columns=['Country', 'Energy Supply', 'Energy Supply per Capita', '% Renewable']
o="..."
n=np.NaN
energy = energy.replace('...', np.nan)
energy['Energy Supply']=energy['Energy Supply']*1000000
old=["Republic of Korea","United States of America","United Kingdom of "
+"Great Britain and Northern Ireland","China, Hong "
+"Kong Special Administrative Region"]
new=["South Korea","United States","United Kingdom","Hong Kong"]
for i in range(0,4):
energy = energy.replace(old[i], new[i])
#I'm trying to remove it here =====>
p="("
for j in range(16,243):
if p in energy.iloc[j]['Country']:
country=""
for c in energy.iloc[j]['Country'] :
while(c!=p & !c.isnumeric()):
country=c+country
energy = energy.replace(energy.iloc[j]['Country'], country)
return energy
Here is the .xls file i'm working on : https://drive.google.com/file/d/0B80lepon1RrYeDRNQVFWYVVENHM/view?usp=sharing
Upvotes: 1
Views: 1527
Reputation: 403128
Use str.extract
:
energy['country'] = energy['country'].str.extract('(^[a-zA-Z]+)', expand=False)
df
country
0 Bolivia (Plurinational State of)
1 Switzerland17
df['country'] = df['country'].str.extract('(^[a-zA-Z]+)', expand=False)
df
country
0 Bolivia
1 Switzerland
To handle countries with spaces in their names (very common), a small improvement to the regex would be enough.
df
country
0 Bolivia (Plurinational State of)
1 Switzerland17
2 West Indies (foo bar)
df['country'] = df['country'].str.extract('(^[a-zA-Z\s]+)', expand=False).str.strip()
df
country
0 Bolivia
1 Switzerland
2 West Indies
Upvotes: 2