user12217822
user12217822

Reputation: 302

How to fill missing value with KNN in python

I'm trying to fill missing values with KNN in python so I wrote this code but it doesn't work . I get this error "ValueError: could not convert string to float: 'normal'" .what should I do?

import pandas as pd
df = pd.read_csv(r'df.csv')
from sklearn.impute import KNNImputer
imputer = KNNImputer(n_neighbors=5)
df = pd.DataFrame(imputer.fit_transform(df),columns = df.columns)

Upvotes: 1

Views: 6997

Answers (3)

AlexTorx
AlexTorx

Reputation: 775

Usually to replace NaN values, we use the sklearn.impute.SimpleImputer which can replace NaN values with the value of your choice (mean , median of the sample, or any other value you would like).

from sklearn.impute import SimpleImputer

imp = SimpleImputer(missing_values=np.nan, strategy='mean')
df = imp.fit_transform(df)

Upvotes: 1

LittleHealth
LittleHealth

Reputation: 122

The KNN method will compute the distance between vectors, so if your data is categorical, you should convert it to numerical. For example, if the string stands labels, you could use one-hot to encode the labels.

There is another python package that implements KNN imputation method: impyte

Upvotes: 0

giraycoskun
giraycoskun

Reputation: 161

I do not know how your df look like but I guess you might be have to use Ordinal or Label Encoders as KNN imputer does not work with text data.

Here is a guide for you:

https://medium.com/@kyawsawhtoon/a-guide-to-knn-imputation-95e2dc496e

Upvotes: 0

Related Questions