Little
Little

Reputation: 3477

Python 3.x fill missing NaN values with the mode

I have programmed the following dataframe in Python:

import pandas as pd
import numpy as np

def main():
    x=pd.DataFrame(np.array([10,"worker","France",
                             20,"eng","Italy",
                             30,"doctor","Spain",
                             40,"eng","EEUU",
                             60,"eng",np.NaN,
                             60,"worker","France"]).reshape(6,3))
    x.columns=["age","job","country"]
    x["country"]=x["country"].fillna(x["country"].mode().iloc[0])
    print (x)

I would like to replace the NaN values of the country column with the mode of that column. I have tried different methods, but still prints the same values. What am I missing? I am using Python 3.7.

Thanks

Upvotes: 1

Views: 113

Answers (1)

Scott Boston
Scott Boston

Reputation: 153460

Because you are getting the string representation of np.NaN. fillna is not working as you expected.

use this:

x['country'].replace('nan',np.nan).fillna(x['country'].mode()[0])

or

x['country'].mask(x['country']=='nan').fillna(x['country'].mode()[0])

Output:

0    France
1     Italy
2     Spain
3      EEUU
4    France
5    France
Name: country, dtype: object

Upvotes: 1

Related Questions