Imputation on the test set with fancyimpute

Question

The python package Fancyimpute provides several methods for the imputation of missing values in Python. The documentation provides examples such as:

# X is the complete data matrix
# X_incomplete has the same values as X except a subset have been replace with NaN

# Model each feature with missing values as a function of other features, and
# use that estimate for imputation.
X_filled_ii = IterativeImputer().fit_transform(X_incomplete)

This works fine when applying the imputation method to a dataset X. But what if a training/test split is necessary? Once

X_train_filled = IterativeImputer().fit_transform(X_train_incomplete)

is called, how do I impute the test set and create X_test_filled? The test set needs to be imputed using the information from the training set. I guess that IterativeImputer() should returns and object that can fit X_test_incomplete. Is that possible?

Please note that imputing on the whole dataset and then split into training and test set is not correct.

Scratch&#39;N&#39;Purr · Accepted Answer

The package looks like it mimic's scikit-learn's API. And after looking in the source code, it looks like it does have a transform method.

my_imputer = IterativeImputer()
X_trained_filled = my_imputer.fit_transform(X_train_incomplete)

# now transform test
X_test_filled = my_imputer.transform(X_test)

The imputer will apply the same imputations that it learned from the training set.

Imputation on the test set with fancyimpute

Answers (1)

Related Questions