Reputation: 3
I have trained a SequentialFeatureSelector
from sklearn and am now interested in the best model (based on the given scoring method) it produced. Is there a possible way of extracting the parameters and using them generate the model that was used?
I have seen that there exists a get_params()
function for the SequentialFeatureSelector
, but I don't undestand how to interpret the output and retrieve the best estimator.
Upvotes: 0
Views: 827
Reputation: 5010
The main result of this model is which features it decided to select. You can access that information in various ways. Suppose you have fitted a selector=SequentialFeatureSelector(...).fit(...)
.
selector.support_
is a boolean vector, where True
means it selected that feature. If you started off with 5 features, and told it to select 2, then the vector will be [True, False, False, False, True]
if it selected the first and last feature.
You can get the same output as above using selector.get_support()
. If you want the indices rather than a boolean vector, you can use selector.get_support(indices=True)
- it'll return [0, 4]
in this case indicating feature number 0 and feature number 3.
To get the feature names (only applies if you fed the model a dataframe):
selector.feature_names_in_[selector.support_]
After fitting the selector, if you want it to strip out the unselected features, you can use selector.transform(X_test)
. The .transform(X_test)
will apply the already-fitted selector to the supplied data. In this example, if X_test
is 100 x 5, then it'll return a 100 x 2 version where it has only kept the features determined from the initial .fit()
.
SequentialFeatureSelector
doesn't keep any of the models fitted during cross-validation. So I think you'd need to fit a new model using the selected features:
#Fit selector
selector = SequentialFeatureSelector(
LogisticRegression(), n_features_to_select=2
).fit(X, y)
print('Selected feature numbers are', selector.get_support(indices=True))
#Use fitted selector to reduce X
X_reduced = selector.transform(X)
#Fit logreg model on the selected features
logreg_fitted = LogisticRegression().fit(X_reduced, y)
Alternatively, this ensures consistency with the original estimator by saving you from needing to manually specify all the original parameters:
from sklearn.base import clone
best_model = clone(selector.estimator)(**selector.estimator.get_params()).fit(selector.transform(X), y).
If you want identical models (down to the random seed) it'll also be necessary to set up the CV appropriately.
Upvotes: 0