How do I change values of column in dataframe while iterating over the column?

Question

I have a dataframe like this:

Cause_of_death       famous_for          name         nationality
suicide by hanging   African jazz        XYZ             South
unknown              Korean president    ABC             South
heart attack         businessman         EFG             American
heart failure        Prime Minister      LMN             Indian
heart problems       African writer      PQR             South

And the dataframe is too big. What I want to do is to make changes in the nationality column. You can see that for the nationality = South, we have Korea and Africa as a part of the strings in the famous_for column. So What I want to do is change the nationality to South Africa if famous_for contains Africa and nationality to South Korea if famous_for contains Korea.

What I had tried is:

for i in deaths['nationality']:
if (deaths['nationality']=='South'):
    if deaths['famous_for'].contains('Korea'):
        deaths['nationality']='South Korea'
    elif deaths['famous_for'].contains('Korea'):
        deaths['nationality']='South Africa'
    else:
        pass

jezrael · Accepted Answer

If many conditions is possible use custom function with DataFrame.apply and axis=1 for process by rows:

def f(x):
    if (x['nationality']=='South'):
        if 'Korea' in x['famous_for']:
            return 'South Korea'
        elif 'Africa' in x['famous_for']:
            return 'South Africa'
    else:
        return x['nationality']


deaths['nationality'] = deaths.apply(f, axis=1)
print (deaths)
       Cause_of_death        famous_for name   nationality
0  suicide by hanging      African jazz  XYZ  South Africa
1             unknown  Korean president  ABC   South Korea
2        heart attack       businessman  EFG      American
3       heart failure    Prime Minister  LMN        Indian
4      heart problems    African writer  PQR  South Africa

But if only few conditions use str.contains with DataFrame.loc:

mask1 = deaths['nationality'] == 'South'
mask2 = deaths['famous_for'].str.contains('Korean')
mask3 = deaths['famous_for'].str.contains('Africa')

deaths.loc[mask1 & mask2, 'nationality']='South Korea'
deaths.loc[mask1 & mask3, 'nationality']='South Africa'
print (deaths)
0  suicide by hanging      African jazz  XYZ  South Africa
1             unknown  Korean president  ABC   South Korea
2        heart attack       businessman  EFG      American
3       heart failure    Prime Minister  LMN        Indian
4      heart problems    African writer  PQR  South Africa

Another solution with mask:

mask1 = deaths['nationality'] == 'South'
mask2 = deaths['famous_for'].str.contains('Korean')
mask3 = deaths['famous_for'].str.contains('Africa')

deaths['nationality'] = deaths['nationality'].mask(mask1 & mask2, 'South Korea')
deaths['nationality'] = deaths['nationality'].mask(mask1 & mask3,'South Africa')
print (deaths)
0  suicide by hanging      African jazz  XYZ  South Africa
1             unknown  Korean president  ABC   South Korea
2        heart attack       businessman  EFG      American
3       heart failure    Prime Minister  LMN        Indian
4      heart problems    African writer  PQR  South Africa

How do I change values of column in dataframe while iterating over the column?

Answers (2)

Related Questions