Reputation: 47126
I have seen in R, imputation of categorical data is done straight forward by packages like DMwR, Caret and also I do have algorithm options like KNN
or CentralImputation
. But I do not see any libraries in python doing the same. FancyImpute performs well on numeric data.
Is there a way to do imputation of Null values in python for categorical data?
Edit: Added the top few rows of the data set.
>>> data_set.head()
1stFlrSF 2ndFlrSF 3SsnPorch Alley BedroomAbvGr BldgType BsmtCond \
0 856 854 0 NaN 3 1Fam TA
1 1262 0 0 NaN 3 1Fam TA
2 920 866 0 NaN 3 1Fam TA
3 961 756 0 NaN 3 1Fam Gd
4 1145 1053 0 NaN 4 1Fam TA
BsmtExposure BsmtFinSF1 BsmtFinSF2 ... SaleType ScreenPorch Street \
0 No 706.0 0.0 ... WD 0 Pave
1 Gd 978.0 0.0 ... WD 0 Pave
2 Mn 486.0 0.0 ... WD 0 Pave
3 No 216.0 0.0 ... WD 0 Pave
4 Av 655.0 0.0 ... WD 0 Pave
TotRmsAbvGrd TotalBsmtSF Utilities WoodDeckSF YearBuilt YearRemodAdd \
0 8 856.0 AllPub 0 2003 2003
1 6 1262.0 AllPub 298 1976 1976
2 6 920.0 AllPub 0 2001 2002
3 7 756.0 AllPub 0 1915 1970
4 9 1145.0 AllPub 192 2000 2000
YrSold
0 2008
1 2007
2 2008
3 2006
4 2008
[5 rows x 81 columns]
Upvotes: 0
Views: 4556
Reputation: 351
There are few ways to deal with missing values. As I understand you want to fill NaN according to specific rule. Pandas fillna can be used. Below code is example of how to fill categoric NaN with most frequent value.
df['Alley'].fillna(value=df['MSZoning'].value_counts().index[0],inplace =True)
Also this might be helpful sklearn.preprocessing.Imputer
For more information about pandas fillna pandas.DataFrame.fillna
Hope this will work
Upvotes: 3