Maxipet
Maxipet

Reputation: 3

How to extract best estimator of a SequentialFeatureSelector

I have trained a SequentialFeatureSelector from sklearn and am now interested in the best model (based on the given scoring method) it produced. Is there a possible way of extracting the parameters and using them generate the model that was used?

I have seen that there exists a get_params() function for the SequentialFeatureSelector, but I don't undestand how to interpret the output and retrieve the best estimator.

Upvotes: 0

Views: 827

Answers (1)

MuhammedYunus
MuhammedYunus

Reputation: 5010

The main result of this model is which features it decided to select. You can access that information in various ways. Suppose you have fitted a selector=SequentialFeatureSelector(...).fit(...).

selector.support_ is a boolean vector, where True means it selected that feature. If you started off with 5 features, and told it to select 2, then the vector will be [True, False, False, False, True] if it selected the first and last feature.

You can get the same output as above using selector.get_support(). If you want the indices rather than a boolean vector, you can use selector.get_support(indices=True) - it'll return [0, 4] in this case indicating feature number 0 and feature number 3.

To get the feature names (only applies if you fed the model a dataframe):

selector.feature_names_in_[selector.support_]

After fitting the selector, if you want it to strip out the unselected features, you can use selector.transform(X_test). The .transform(X_test) will apply the already-fitted selector to the supplied data. In this example, if X_test is 100 x 5, then it'll return a 100 x 2 version where it has only kept the features determined from the initial .fit().

SequentialFeatureSelector doesn't keep any of the models fitted during cross-validation. So I think you'd need to fit a new model using the selected features:

#Fit selector
selector = SequentialFeatureSelector(
    LogisticRegression(), n_features_to_select=2
).fit(X, y)

print('Selected feature numbers are', selector.get_support(indices=True))

#Use fitted selector to reduce X
X_reduced = selector.transform(X)

#Fit logreg model on the selected features
logreg_fitted = LogisticRegression().fit(X_reduced, y)

Alternatively, this ensures consistency with the original estimator by saving you from needing to manually specify all the original parameters:

from sklearn.base import clone

best_model = clone(selector.estimator)(**selector.estimator.get_params()).fit(selector.transform(X), y).

If you want identical models (down to the random seed) it'll also be necessary to set up the CV appropriately.

Upvotes: 0

Related Questions