Reputation: 5
def replace_name(row):
if row['Country Name'] == 'Korea, Rep.':
row['Country Name'] = 'South Korea'
if row['Country Name'] == 'Iran, Islamic Rep.':
row['Country Name'] = 'Iran'
if row['Country Name'] == 'Hong Kong SAR, China':
row['Country Name'] = 'Hong Kong'
return row
GDP.apply(replace_name, axis = 1)
GDP is a 'pd.DataFrame'
In this time when I want to find 'South Korea', it doesn't work, the name is still 'Korea, Rep.'
but if I change the last row in the code to this
GDP = GDP.apply(replace_name, axis = 1)
it works.
At first, I thought the reason is that 'apply' function can't change the GDP itself, but when I dealt with another dataframe, it actually works. The code is below:
def change_name(row):
if row['Country'] == "Republic of Korea":
row['Country'] = 'South Korea'
if row['Country'] == 'United States of America':
row['Country'] = 'United States'
if row['Country'] == 'United Kingdom of Great Britain and Northern Ireland':
row['Country'] ='United Kingdom'
if row['Country'] == 'China, Hong Kong Special Administrative Region':
row['Country'] = 'Hong Kong'
return row
energy.apply(change_name, axis = 1)
energy is also a 'pd.dataframe'.
This time when I search for 'United States', it works. And the original name is 'United States of America', so it changes the name successfully.
The only difference between energy and GDP is that energy is read from an excel file, and GDP is read from a CSV file. So what cause the different result?
Upvotes: 0
Views: 64
Reputation: 862406
I think better is use replace
:
d = {'Korea, Rep.':'South Korea', 'Iran, Islamic Rep.':'Iran',
'Hong Kong SAR, China':'Hong Kong'}
GDP['Country Name'] = GDP['Country Name'].replace(d, regex=True)
For difference is possible some whitespace in data, maybe help:
GDP['Country'] = GDP['Country'].str.strip()
Sample:
GDP = pd.DataFrame({'Country Name':[' Korea, Rep. ','a','Iran, Islamic Rep.','United States of America','s','United Kingdom of Great Britain and Northern Ireland'],
'Country': ['s','Hong Kong SAR, China','United States of America','Hong Kong SAR, China','s','f']})
#print (GDP)
d = {'Korea, Rep.':'South Korea', 'Iran, Islamic Rep.':'Iran',
'United Kingdom of Great Britain and Northern Ireland':'United Kingdom',
'Hong Kong SAR, China':'Hong Kong', 'United States of America':'United States'}
#replace by columns
#GDP['Country Name'] = GDP['Country Name'].replace(d, regex=True)
#GDP['Country'] = GDP['Country'].replace(d, regex=True)
#replace multiple columns
GDP[['Country Name','Country']] = GDP[['Country Name','Country']].replace(d, regex=True)
print (GDP)
Country Country Name
0 s South Korea
1 Hong Kong a
2 United States Iran
3 Hong Kong United States
4 s s
5 f United Kingdom
Upvotes: 1