Sarah
Sarah

Reputation: 617

Best way to fill NULL values with conditions using Pandas?

So for example I have a data looks like this:

df = pd.DataFrame([[np.NaN, '1-5'], [np.NaN, '26-100'], ['Yes', 'More than 1000'], ['No', '26-100'], ['Yes', '1-5']], columns=['self_employed', 'no_employees'])
df

    self_employed   no_employees
0   nan                  1-5
1   nan                 26-100
2   Yes            More than 1000
3   No                  26-100
4   Yes                  1-5

And I'm trying to fill the NULL value based on the condition that:

If no_employees is '1-6' then 'Yes', otherwise 'No'

I was able to complete this using the dictionary such as:

self_employed_dict = {'1-5': 'Yes', '6-25': 'No', '26-100': 'No', '100-500': 'No', 'More than 1000':'No', '500-1000': 'No'}
df['self_employed'] = df['self_employed'].fillna(df['no_employees'].map(self_employed_dict))

But I wanted to know if there is a better, simple way of doing this. In this example, I had to write the dictionary for myself to map it, so how can I do this in the smart way?

The expected output looks like this:

    self_employed   no_employees
0   Yes                  1-5
1   No                 26-100
2   Yes            More than 1000
3   No                  26-100
4   Yes                  1-5

Upvotes: 1

Views: 9216

Answers (3)

mosc9575
mosc9575

Reputation: 6367

The answer depends on your pandas version. There are two cases:

  1. Pandas Verion 1.0.0+, to check

    print(df['self_employed'].isna()).any() will returns False and/or

    type(df.iloc[0,0]) returns type str.

    In this case all elements of your dataframe are of type string and fillna() will not work. This is because the fillna() function will not react on the string nan so you can use update():

helper = df['no_employees'].eq('1-5').map({False: 'No', True: 'Yes'}).to_frame('self_employed')
df.update(other=helper, filter_func=lambda x: df['self_employed'].eq('nan'))
  1. Older Pandas Version there data types can be mixed up, this means

    print(df['self_employed'].isna()).any() will returns True and/or

    type(df.iloc[0,0]) returns type float

values = df['no_employees'].eq('1-5').map({False: 'No', True: 'Yes'})
df['self_employed'] = df['self_employed'].fillna(values)

This will get you:

  self_employed    no_employees
0           Yes             1-5
1            No          26-100
2           Yes  More than 1000
3            No          26-100
4           Yes             1-5

Upvotes: 1

OK 400
OK 400

Reputation: 831

You could use:

pd.fillna(0)

it fills NA/NaN values with the value you want (in this case 0).

See more details here

Upvotes: -1

Dani Mesejo
Dani Mesejo

Reputation: 61930

Use fillna is the right way to go, but instead you could do:

values = df['no_employees'].eq('1-5').map({False: 'No', True: 'Yes'})
df['self_employed'] = df['self_employed'].fillna(values)
print(df)

Output

  self_employed    no_employees
0           Yes             1-5
1            No          26-100
2           Yes  More than 1000
3            No          26-100
4           Yes             1-5

Upvotes: 3

Related Questions