Reputation: 302
I'm trying to fill missing values with KNN in python so I wrote this code but it doesn't work . I get this error "ValueError: could not convert string to float: 'normal'" .what should I do?
import pandas as pd
df = pd.read_csv(r'df.csv')
from sklearn.impute import KNNImputer
imputer = KNNImputer(n_neighbors=5)
df = pd.DataFrame(imputer.fit_transform(df),columns = df.columns)
Upvotes: 1
Views: 6997
Reputation: 775
Usually to replace NaN values, we use the sklearn.impute.SimpleImputer
which can replace NaN
values with the value of your choice (mean , median of the sample, or any other value you would like).
from sklearn.impute import SimpleImputer
imp = SimpleImputer(missing_values=np.nan, strategy='mean')
df = imp.fit_transform(df)
Upvotes: 1
Reputation: 122
The KNN method will compute the distance between vectors, so if your data is categorical, you should convert it to numerical. For example, if the string stands labels, you could use one-hot to encode the labels.
There is another python package that implements KNN imputation method: impyte
Upvotes: 0
Reputation: 161
I do not know how your df look like but I guess you might be have to use Ordinal or Label Encoders as KNN imputer does not work with text data.
Here is a guide for you:
https://medium.com/@kyawsawhtoon/a-guide-to-knn-imputation-95e2dc496e
Upvotes: 0