Reputation: 3821
The python package Fancyimpute provides several methods for the imputation of missing values in Python. The documentation provides examples such as:
# X is the complete data matrix
# X_incomplete has the same values as X except a subset have been replace with NaN
# Model each feature with missing values as a function of other features, and
# use that estimate for imputation.
X_filled_ii = IterativeImputer().fit_transform(X_incomplete)
This works fine when applying the imputation method to a dataset X
. But what if a training/test
split is necessary? Once
X_train_filled = IterativeImputer().fit_transform(X_train_incomplete)
is called, how do I impute the test set and create X_test_filled
? The test set needs to be imputed using the information from the training set. I guess that IterativeImputer()
should returns and object that can fit X_test_incomplete
. Is that possible?
Please note that imputing on the whole dataset and then split into training and test set is not correct.
Upvotes: 2
Views: 3545
Reputation: 10419
The package looks like it mimic's scikit-learn's API. And after looking in the source code, it looks like it does have a transform
method.
my_imputer = IterativeImputer()
X_trained_filled = my_imputer.fit_transform(X_train_incomplete)
# now transform test
X_test_filled = my_imputer.transform(X_test)
The imputer will apply the same imputations that it learned from the training set.
Upvotes: 5