Reputation: 617
So for example I have a data looks like this:
df = pd.DataFrame([[np.NaN, '1-5'], [np.NaN, '26-100'], ['Yes', 'More than 1000'], ['No', '26-100'], ['Yes', '1-5']], columns=['self_employed', 'no_employees'])
df
self_employed no_employees
0 nan 1-5
1 nan 26-100
2 Yes More than 1000
3 No 26-100
4 Yes 1-5
And I'm trying to fill the NULL value based on the condition that:
If no_employees is '1-6' then 'Yes', otherwise 'No'
I was able to complete this using the dictionary such as:
self_employed_dict = {'1-5': 'Yes', '6-25': 'No', '26-100': 'No', '100-500': 'No', 'More than 1000':'No', '500-1000': 'No'}
df['self_employed'] = df['self_employed'].fillna(df['no_employees'].map(self_employed_dict))
But I wanted to know if there is a better, simple way of doing this. In this example, I had to write the dictionary for myself to map it, so how can I do this in the smart way?
The expected output looks like this:
self_employed no_employees
0 Yes 1-5
1 No 26-100
2 Yes More than 1000
3 No 26-100
4 Yes 1-5
Upvotes: 1
Views: 9216
Reputation: 6367
The answer depends on your pandas version. There are two cases:
Pandas Verion 1.0.0+, to check
print(df['self_employed'].isna()).any()
will returns False
and/or
type(df.iloc[0,0])
returns type str
.
In this case all elements of your dataframe are of type string and fillna()
will not work. This is because the fillna()
function will not react on the string nan
so you can use update()
:
helper = df['no_employees'].eq('1-5').map({False: 'No', True: 'Yes'}).to_frame('self_employed')
df.update(other=helper, filter_func=lambda x: df['self_employed'].eq('nan'))
Older Pandas Version there data types can be mixed up, this means
print(df['self_employed'].isna()).any()
will returns True
and/or
type(df.iloc[0,0])
returns type float
values = df['no_employees'].eq('1-5').map({False: 'No', True: 'Yes'})
df['self_employed'] = df['self_employed'].fillna(values)
This will get you:
self_employed no_employees
0 Yes 1-5
1 No 26-100
2 Yes More than 1000
3 No 26-100
4 Yes 1-5
Upvotes: 1
Reputation: 831
You could use:
pd.fillna(0)
it fills NA
/NaN
values with the value you want (in this case 0
).
Upvotes: -1
Reputation: 61930
Use fillna is the right way to go, but instead you could do:
values = df['no_employees'].eq('1-5').map({False: 'No', True: 'Yes'})
df['self_employed'] = df['self_employed'].fillna(values)
print(df)
Output
self_employed no_employees
0 Yes 1-5
1 No 26-100
2 Yes More than 1000
3 No 26-100
4 Yes 1-5
Upvotes: 3