user517696
user517696

Reputation: 2672

How do I change values of column in dataframe while iterating over the column?

I have a dataframe like this:

Cause_of_death       famous_for          name         nationality
suicide by hanging   African jazz        XYZ             South
unknown              Korean president    ABC             South
heart attack         businessman         EFG             American
heart failure        Prime Minister      LMN             Indian
heart problems       African writer      PQR             South

And the dataframe is too big. What I want to do is to make changes in the nationality column. You can see that for the nationality = South, we have Korea and Africa as a part of the strings in the famous_for column. So What I want to do is change the nationality to South Africa if famous_for contains Africa and nationality to South Korea if famous_for contains Korea.

What I had tried is:

for i in deaths['nationality']:
if (deaths['nationality']=='South'):
    if deaths['famous_for'].contains('Korea'):
        deaths['nationality']='South Korea'
    elif deaths['famous_for'].contains('Korea'):
        deaths['nationality']='South Africa'
    else:
        pass

Upvotes: 0

Views: 269

Answers (2)

Allen Qin
Allen Qin

Reputation: 19947

You can use contains() to check if the famous_for columns includes Korea or Africa and set nationality accordingly.

df.loc[df.famous_for.str.contains('Korean'), 'nationality']='South Korean'

df.loc[df.famous_for.str.contains('Africa'), 'nationality']='South Africa'

df
Out[783]: 
       Cause_of_death        famous_for  name   nationality
0  suicide by hanging      African jazz   XYZ  South Africa
1             unknown  Korean president   ABC  South Korean
2        heart attack       businessman   EFG      American
3       heart failure    Prime Minister   LMN        Indian
4      heart problems    African writer   PQR  South Africa

Or you can do this in one line using:

df.nationality = (
    df.nationality.str.cat(df.famous_for.str.extract('(Africa|Korea)',expand=False),
                           sep=' ', na_rep=''))

df
Out[801]: 
       Cause_of_death        famous_for  name    nationality
0  suicide by hanging      African jazz   XYZ   South Africa
1             unknown  Korean president   ABC    South Korea
2        heart attack       businessman   EFG      American 
3       heart failure    Prime Minister   LMN        Indian 
4      heart problems    African writer   PQR   South Africa

Upvotes: 2

jezrael
jezrael

Reputation: 862481

If many conditions is possible use custom function with DataFrame.apply and axis=1 for process by rows:

def f(x):
    if (x['nationality']=='South'):
        if 'Korea' in x['famous_for']:
            return 'South Korea'
        elif 'Africa' in x['famous_for']:
            return 'South Africa'
    else:
        return x['nationality']


deaths['nationality'] = deaths.apply(f, axis=1)
print (deaths)
       Cause_of_death        famous_for name   nationality
0  suicide by hanging      African jazz  XYZ  South Africa
1             unknown  Korean president  ABC   South Korea
2        heart attack       businessman  EFG      American
3       heart failure    Prime Minister  LMN        Indian
4      heart problems    African writer  PQR  South Africa

But if only few conditions use str.contains with DataFrame.loc:

mask1 = deaths['nationality'] == 'South'
mask2 = deaths['famous_for'].str.contains('Korean')
mask3 = deaths['famous_for'].str.contains('Africa')

deaths.loc[mask1 & mask2, 'nationality']='South Korea'
deaths.loc[mask1 & mask3, 'nationality']='South Africa'
print (deaths)
0  suicide by hanging      African jazz  XYZ  South Africa
1             unknown  Korean president  ABC   South Korea
2        heart attack       businessman  EFG      American
3       heart failure    Prime Minister  LMN        Indian
4      heart problems    African writer  PQR  South Africa

Another solution with mask:

mask1 = deaths['nationality'] == 'South'
mask2 = deaths['famous_for'].str.contains('Korean')
mask3 = deaths['famous_for'].str.contains('Africa')

deaths['nationality'] = deaths['nationality'].mask(mask1 & mask2, 'South Korea')
deaths['nationality'] = deaths['nationality'].mask(mask1 & mask3,'South Africa')
print (deaths)
0  suicide by hanging      African jazz  XYZ  South Africa
1             unknown  Korean president  ABC   South Korea
2        heart attack       businessman  EFG      American
3       heart failure    Prime Minister  LMN        Indian
4      heart problems    African writer  PQR  South Africa

Upvotes: 1

Related Questions