Reputation: 35

How can I replace missing boolean values using python?

In my dataset, one of the columns is a boolean value, and there are missing values within the dataset and within other continuous variable columns which are successfully replaced with their mean. But the mean value can not be replaced for missing boolean. So how can I replace those values?

Note that the boolean is 1 or 0 in my dataset.

Below is the code for replacing continuous missing values:

from sklearn.impute import SimpleImputer
imputer = SimpleImputer(missing_values=np.nan, strategy='mean')
imputer.fit(x)
x = imputer.transform(x)

Thank You

Upvotes: 3

Answers (2)

Antoine Dubuis

Reputation: 5324

You can treat this boolean variable as a categorical feature and then use a SimpleImputer with the most_frequent strategy instead of mean.

You can do as follow:

from sklearn.impute import SimpleImputer
import numpy as np

#Create sample data with nans
X = np.random.randint(2, size=100).reshape(1,-1).astype(float)
X[0,::4] = np.nan

SimpleImputer(strategy="most_frequent").fit_transform(X)

Upvotes: 1

Michael Fleicher Tal

Reputation: 35

there are several methods to attack this issue.

if you can afford it (if you have enough data) exclude those lines
replace those lines with the majority value (same as replacing with mean of continuous value)
for time series - replace the cell with mean of x cells before and after and set a threshold which above it - the mean will become 0, else , the mean will become 0

Upvotes: 1

How can I replace missing boolean values using python?

Answers (2)

Related Questions