Liam Yi
Liam Yi

Reputation: 5

why python's apply function sometimes can, sometimes can not change the value of dataframe?

def replace_name(row):
    if row['Country Name'] == 'Korea, Rep.':
        row['Country Name'] = 'South Korea'
    if row['Country Name'] == 'Iran, Islamic Rep.':
        row['Country Name'] = 'Iran'
    if row['Country Name'] == 'Hong Kong SAR, China':
        row['Country Name'] = 'Hong Kong'
    return row

GDP.apply(replace_name, axis = 1)

GDP is a 'pd.DataFrame'

In this time when I want to find 'South Korea', it doesn't work, the name is still 'Korea, Rep.'

but if I change the last row in the code to this

GDP = GDP.apply(replace_name, axis = 1)

it works.

At first, I thought the reason is that 'apply' function can't change the GDP itself, but when I dealt with another dataframe, it actually works. The code is below:

def change_name(row):
    if row['Country'] == "Republic of Korea":
        row['Country'] = 'South Korea'
    if row['Country'] == 'United States of America':
        row['Country'] = 'United States'
    if row['Country'] == 'United Kingdom of Great Britain and Northern Ireland':
        row['Country']  ='United Kingdom'
    if row['Country'] == 'China, Hong Kong Special Administrative Region':
        row['Country'] = 'Hong Kong'
    return row

energy.apply(change_name, axis = 1)

energy is also a 'pd.dataframe'.

This time when I search for 'United States', it works. And the original name is 'United States of America', so it changes the name successfully.

The only difference between energy and GDP is that energy is read from an excel file, and GDP is read from a CSV file. So what cause the different result?

Upvotes: 0

Views: 64

Answers (1)

jezrael
jezrael

Reputation: 862406

I think better is use replace:

d = {'Korea, Rep.':'South Korea', 'Iran, Islamic Rep.':'Iran', 
     'Hong Kong SAR, China':'Hong Kong'}
GDP['Country Name'] = GDP['Country Name'].replace(d, regex=True)

For difference is possible some whitespace in data, maybe help:

GDP['Country'] = GDP['Country'].str.strip()

Sample:

GDP = pd.DataFrame({'Country Name':[' Korea, Rep. ','a','Iran, Islamic Rep.','United States of America','s','United Kingdom of Great Britain and Northern Ireland'],
                    'Country':     ['s','Hong Kong SAR, China','United States of America','Hong Kong SAR, China','s','f']})

#print (GDP)

d = {'Korea, Rep.':'South Korea', 'Iran, Islamic Rep.':'Iran', 
     'United Kingdom of Great Britain and Northern Ireland':'United Kingdom',
     'Hong Kong SAR, China':'Hong Kong', 'United States of America':'United States'}

#replace by columns
#GDP['Country Name'] = GDP['Country Name'].replace(d, regex=True)
#GDP['Country'] = GDP['Country'].replace(d, regex=True)

#replace multiple columns
GDP[['Country Name','Country']] = GDP[['Country Name','Country']].replace(d, regex=True)
print (GDP)
         Country    Country Name
0              s     South Korea
1      Hong Kong               a
2  United States            Iran
3      Hong Kong   United States
4              s               s
5              f  United Kingdom

Upvotes: 1

Related Questions