Jas
Jas

Reputation: 25

How to create a column in a dataframe based on another value in the row (Python)

I have the following data:

country code continent plants invertebrates vertebrates total
Afghanistan AFG Asia 5 2 33 40
Albania ALB Europe 5 71 61 137
Algeria DZA Africa 24 40 81 145

I want to add a hemisphere column that is determined by the continent that references a list. I want to do it using a custom function (and not using lambda).

I attempted the following:

northern = ['North America', 'Asia', 'Europe']
southern = ['Africa','South America', 'Oceania']

def hem(x,y):
    if y in northern:
        x = 'northern'
        return x
       
    elif y in southern:
        x = 'southern'
        return x
           
    else:
        x = 'Not Found'
        return x

species_custom['hemisphere'] = species_custom.apply(hem, args=(species_custom['continent'],), axis=1)

I receive the following error:

ValueError: ('The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().', 'occurred at index 0')

Any help is greatly appreciated.

Upvotes: 0

Views: 41

Answers (1)

sitting_duck
sitting_duck

Reputation: 3720

hem is defined as taking two arguments but in the apply you only pass one. And when you do you are passing the full continent column to it. Probably not what you want.

You could simplify by using nested numpy where.

import numpy as np

df['hemisphere'] = np.where(df['continent'].isin(northern), 'northern', np.where(df['continent'].isin(southern),'southern','Not Found'))

Result

       country code continent  plants  invertebrates  vertebrates  total  hemisphere
0  Afghanistan  AFG      Asia       5              2           33     40    northern 
1      Albania  ALB    Europe       5             71           61    137    northern 
2      Algeria  DZA    Africa      24             40           81    145    southern 

Upvotes: 0

Related Questions