user15345826
user15345826

Reputation: 41

Use RFECV in linear regression pipeline

I am trying to use sklearn RFECV to perform feature selection. So I wrote below code to identify optimal no of features.

from sklearn.feature_selection import RFECV,RFE
rfecv = RFECV(
    estimator=LinearRegression(),
    step=1,
    cv=folds,
    scoring="r2",
    min_features_to_select=2,
    n_jobs=2,
)
rfecv.fit(X, y)
print(f"Optimal number of features: {rfecv.n_features_}")

Output of above code is 6 which means RFECV recommends 6 feature to select for best model performance.

When I try to use RFECV in regression pipeline as below:

from sklearn.feature_selection import RFECV,RFE
rfecv = RFECV(
    estimator=LinearRegression(),
    step=1,
    cv=folds,
    scoring="r2",
    min_features_to_select=2,
    n_jobs=2,
)

pipeline = Pipeline(steps=[('s',rfecv),('m',LinearRegression())])
# fit the model on all available data
pipeline.fit(X_train,y_train)
pipeline.n_features_in_

pipeline.n_features_in_ returns 215 which is the total number of features in my train dataset. I was expecting pipeline.n_features_in_ to return 6. Also, with above pipeline , I don't see any difference in X_test prediction score when compared linear regression result without any feature selection.

I am trying to understand why in pipeline, pipeline.n_features_in_ returns 215 rather than 6 ? As per my understanding , 1st step of the pipeline would fit and transform X_train dataset and then transformed data set would be passed to linearRegression() .

Is my understanding of above pipeline correct?

Upvotes: 0

Views: 528

Answers (1)

afsharov
afsharov

Reputation: 5164

From the docs of Pipeline:

n_features_in_ : int
Number of features seen during first step fit method.

So you get the number of features before any step has been actually performed. If you want the number of features after RFECV then e.g. check the n_features_in_ attribute of the LinearRegression step:

pipeline['m'].n_features_in_

where pipeline['m'] returns the LinearRegression object.

Upvotes: 2

Related Questions