Reputation: 35
In my dataset, one of the columns is a boolean value, and there are missing values within the dataset and within other continuous variable columns which are successfully replaced with their mean. But the mean value can not be replaced for missing boolean. So how can I replace those values?
Note that the boolean is 1 or 0 in my dataset.
Below is the code for replacing continuous missing values:
from sklearn.impute import SimpleImputer
imputer = SimpleImputer(missing_values=np.nan, strategy='mean')
imputer.fit(x)
x = imputer.transform(x)
Thank You
Upvotes: 3
Views: 1558
Reputation: 5304
You can treat this boolean variable as a categorical feature and then use a SimpleImputer
with the most_frequent
strategy instead of mean
.
You can do as follow:
from sklearn.impute import SimpleImputer
import numpy as np
#Create sample data with nans
X = np.random.randint(2, size=100).reshape(1,-1).astype(float)
X[0,::4] = np.nan
SimpleImputer(strategy="most_frequent").fit_transform(X)
Upvotes: 1
Reputation: 35
there are several methods to attack this issue.
Upvotes: 1