Rafaó
Rafaó

Reputation: 599

fillna() with map(dict) fills not only NaNs, but all values

I have a DataFrame called data with some columns. One of them is Married and another one is Gender. Both variables are categorical.

>>> print(data[['Gender', 'Married']].dtypes)
Gender     category
Married    category
dtype: object

Married contains no NaN values, but Gender contains 12 NaN values, which I want to impute.

>>> print(data['Gender'].isna().sum())
12

I've made quick analysis that if you have Married='Yes', then you're much more likely to have Gender='Male'. So I want to impute Gender values in such manner:

Married='Yes' ->  Gender='Male'
Married='No'  ->  Gender='Female'

So I created a dictionary:

dictionary = {'Yes': 'Male', 'No': 'Female'}

Then I wrote a simple code based on fillna():

data['Gender'].fillna(data['Married'].map(dictionary), inplace=True)

And it worked... in totally different way then expected. It changed the whole Gender column! Every single entry now is based on Married column. Look at these crosstabs:

Before fillna():

Married   No  Yes
Gender           
Female    80   31
Male     129  352

After fillna():

Married   No  Yes
Gender           
Female   212    0
Male       0  392

What can I do to fill NaN Gender values basing on Married column?

Upvotes: 3

Views: 4843

Answers (2)

yatu
yatu

Reputation: 88276

You could use np.select, which returns values from a choicelist depending on the results of the conditions:

n = df.Gender.isna()
m1 = n & (df.Married == 'Yes')
m2 = n & (df.Married == 'No')
np.select([m1,m2], ['Male','Female'], default=df.Gender)

Upvotes: 2

jpp
jpp

Reputation: 164773

Your code looks fine. If it doesn't work, there may be a Pandas bug. You can try loc assignment with Boolean indexing instead:

mask = df['Gender'].isnull()
df.loc[mask, 'Gender'] = df.loc[mask, 'Married'].map(dictionary)

Upvotes: 4

Related Questions