Reputation: 1481
There is a proposal to implement this in Sklearn
#15075, but in the meantime, eli5
is suggested as a solution. However, I'm not sure if I'm using it the right way. This is my code:
from sklearn.datasets import make_friedman1
from sklearn.feature_selection import RFECV
from sklearn.svm import SVR
import eli5
X, y = make_friedman1(n_samples=50, n_features=10, random_state=0)
estimator = SVR(kernel="linear")
perm = eli5.sklearn.PermutationImportance(estimator, scoring='r2', n_iter=10, random_state=42, cv=3)
selector = RFECV(perm, step=1, min_features_to_select=1, scoring='r2', cv=3)
selector = selector.fit(X, y)
selector.ranking_
There are a few issues:
I am not sure if I am using cross-validation the right way. PermutationImportance
is using cv
to validate importance on the validation set, or cross-validation should be only with RFECV
? (in the example, I used cv=3
in both cases, but not sure if that's the right thing to do)
If I run eli5.show_weights(perm)
, I'll get: AttributeError: 'PermutationImportance' object has no attribute 'feature_importances_'
. Is this because I fit using RFECV
? what I'm doing is similar to the last snippet here: https://eli5.readthedocs.io/en/latest/blackbox/permutation_importance.html
as a less important issue, this gives me a warning when I set cv
in eli5.sklearn.PermutationImportance
:
.../lib/python3.8/site-packages/sklearn/utils/validation.py:68: FutureWarning: Pass classifier=False as keyword args. From version 0.25 passing these as positional arguments will result in an error warnings.warn("Pass {} as keyword args. From version 0.25 "
The whole process is a bit vague. Is there a way to do it directly in Sklearn
? e.g. by adding a feature_importances
attribute?
Upvotes: 5
Views: 3516
Reputation: 22031
You can directly compute RFECV using sklearn by building your estimator that computes feature importance, using any logic you want, when calling fit.
If you want to compute feature importance based on permutation using an SVR regressor, the estimator you have to implement is:
from sklearn.inspection import permutation_importance
from sklearn.model_selection import train_test_split
from sklearn.svm import SVR
class SVRExplainerRegressor(SVR):
def fit(self, X,y):
X_train, X_val, y_train, y_val = train_test_split(
X, y, test_size=0.25, random_state=42, shuffle=True
)
super().fit(X_train,y_train)
self.perm_feature_importances_ = permutation_importance(
self, X_val, y_val,
n_repeats=5, random_state=42,
)['importances_mean']
return super().fit(X,y)
SVRExplainerRegressor
does the following:
SVRExplainerRegressor
can be used like any sklearn model as RFECV's estimator in this way:
from sklearn.feature_selection import RFECV
from sklearn.datasets import make_friedman1
X, y = make_friedman1(n_samples=50, n_features=10, random_state=0)
model = SVRExplainerRegressor(kernel='linear')
selector = RFECV(model, step=1, min_features_to_select=1,
importance_getter='perm_feature_importances_',
scoring='r2', cv=3)
selector.fit(X, y)
This logic can be customized using any estimator (both regressor or classifier) and any feature importance logic (like SHAP or similar)
Upvotes: 1
Reputation: 5174
Since the objective is to select the optimal number of features with permutation importance and recursive feature elimination, I suggest using RFECV
and PermutationImportance
in conjunction with a CV splitter like KFold
. The code could then look like this:
import warnings
from eli5 import show_weights
from eli5.sklearn import PermutationImportance
from sklearn.datasets import make_friedman1
from sklearn.feature_selection import RFECV
from sklearn.model_selection import KFold
from sklearn.svm import SVR
warnings.filterwarnings("ignore", category=FutureWarning)
X, y = make_friedman1(n_samples=50, n_features=10, random_state=0)
splitter = KFold(n_splits=3) # 3 folds as in the example
estimator = SVR(kernel="linear")
selector = RFECV(
PermutationImportance(estimator, scoring='r2', n_iter=10, random_state=42, cv=splitter),
cv=splitter,
scoring='r2',
step=1
)
selector = selector.fit(X, y)
selector.ranking_
show_weights(selector.estimator_)
Regarding your issues:
PermutationImportance
will calculate the feature importance and RFECV
the r2 scoring with the same strategy according to the splits provided by KFold
.
You called show_weights
on the unfitted PermutationImportance
object. That is why you got an error. You should access the fitted object with the estimator_
attribute instead.
Can be ignored.
Upvotes: 3