Himank Airon
Himank Airon

Reputation: 63

F_Regression from sklearn.feature_selection

I found the F_regression technique for feature selection in the sklearn feature selection module . I was not able to understand the principle it uses . The description given was -

Univariate linear regression tests.
Quick linear model for testing the effect of a single regressor, sequentially for many regressors. This is done in 3 steps:

    1.The regressor of interest and the data are orthogonalized wrt constant regressors.
    2. The cross correlation between data and regressors is computed.
    3. It is converted to an F score then to a p-value.

I am not able to understand this , please can someone explain this in layman terms.

Upvotes: 2

Views: 1921

Answers (1)

Jeremiah Johnson
Jeremiah Johnson

Reputation: 143

The language in the docs is a little obtuse. I believe 'data' refers to the response. First, the chosen regressor and the response are orthogonalized with respect to the rest of the regressors. This reduces any multicollinearity that may be present. Then, the correlation between the chosen regressor and the response is calculated. In a univariate setting, the correlation coefficient is the square root of R^2, which can be written in terms of the F-statistic used in testing the overall significance of a model (see also this: https://stats.stackexchange.com/questions/56881/whats-the-relationship-between-r2-and-f-test). So next, the correlation is converted to an F-statistic, the corresponding p-value is calculated, and F and p are returned. If there is more than one regressor, this is done for all regressors one at a time.

Upvotes: 3

Related Questions