Prasun Jain
Prasun Jain

Reputation: 39

Appending a empty list when true with 1 and with zero according to data in pandas column?

import pandas as pd
from pandas import DataFrame,Series
import numpy as np
titanic=pd.read_csv('C:/Users/prasun.j/Downloads/train.csv')
sex=[]
if titanic['Sex']=='male':
    sex.append(1)
else:
    sex.append(0)
sex

I m trying to a list which should be append by 1 when if statement encounters male or 0 when it encounters female,I dont know what I m doing wrong,can someone helpout,thanks in advance,execution throws following error

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-6-265768ba34be> in <module>()
      4 titanic=pd.read_csv('C:/Users/prasun.j/Downloads/train.csv')
      5 sex=[]
----> 6 if titanic['Sex']=='male':
      7     sex.append(1)
      8 else:

C:\anaconda\lib\site-packages\pandas\core\generic.pyc in __nonzero__(self)
   1119         raise ValueError("The truth value of a {0} is ambiguous. "
   1120                          "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
-> 1121                          .format(self.__class__.__name__))
   1122 
   1123     __bool__ = __nonzero__

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Upvotes: 1

Views: 78

Answers (2)

Joe
Joe

Reputation: 12417

You could also use get_dummies dropping the first column(in this case dropping female):

df = pd.DataFrame({'sex': ['male', 'female', 'male', 'male', 'female','male'], 'age':[10,20,30,40,50,60]})

use pd.get_dummies to obtain your values:

sex = pd.get_dummies(df['sex'],drop_first=True)
sex
   male
0  1
1  0
2  1
3  1
4  0
5  1

And then convert to a list:

list_sex = sex['male'].tolist()
list_sex

[1, 0, 1, 1, 0, 1]

Upvotes: 0

user3483203
user3483203

Reputation: 51155

When you check if titanic['Sex']=='male', you are comparing male to the entire Series, which is why you get your ValueError.

If you really wanted to continue with an iterative approach, you could use iterrows, and check your condition for each row. However, you should avoid iteration with Pandas, and here there is a much cleaner solution.

Setup

df = pd.DataFrame({'sex': ['male', 'female', 'male', 'male', 'female']})

Just use np.where here:

np.where(df.sex == 'male', 1, 0)
# array([1, 0, 1, 1, 0])

You could also use boolean indexing:

(df.sex == 'male').astype(int).values.tolist()
# [1, 0, 1, 1, 0]

Upvotes: 2

Related Questions