Alex
Alex

Reputation: 4264

Sklearn Univariate Selection: Features are Constant

I am getting the following warning message when trying to use Feature Selection and f_classif (ANOVA test) on some data in sklearn:

C:\Users\Alexander\Anaconda3\lib\site-packages\sklearn\feature_selection\univariate_selection.py:113: UserWarning: Features ... are constant. UserWarning)

The features that the warning message indicated were constant apparently had p-values of 0. I was unable to find any information about what was causing this warning. The github file for this particular function is here: https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/feature_selection/univariate_selection.py

Any help would be appreciated, thanks.

Upvotes: 9

Views: 7820

Answers (1)

Jasmin Rueegger
Jasmin Rueegger

Reputation: 81

You get the feature by using the index as an index on the array of columns from your X: X_train.columns[yourindex]

Then you can either drop this feature manually, or you can use VarianceFilter to remove all zero-variance features:

    from sklearn.feature_selection import VarianceThreshold
    constant_filter = VarianceThreshold(threshold=0)
    constant_filter.fit(X_train)
    constant_columns = [column for column in X_train.columns
                    if column not in
    X_train.columns[constant_filter.get_support()]]
    X_test = constant_filter.transform(X_train)
    X_test = constant_filter.transform(X_test)
    for column in constant_columns:
        print("Removed ", column)

You would have to determine the zero-variance features on the training dataframe, because your overall df could contain the feature more than once. Then remove the feature from both dfs.

Upvotes: 8

Related Questions