Shuvayan Das
Shuvayan Das

Reputation: 1048

passing an array with index of discrete feature to mutual_info_classif in python

I am using MI from sklearn.feature_selection.mutual_info_classif to calculate MI between 4 continuous variables(X matrix) and y(target class)

X:

prop_tenure prop_12m    prop_6m prop_3m
    0.04        0.04        0.06    0.08
    0           0           0       0
    0           0           0       0
    0.06        0.06        0.1     0
    0.38        0.38        0.25    0
    0.61        0.61        0.66    0.61
    0.01        0.01        0.02    0.02
    0.1         0.1         0.12    0.16
    0.04        0.04        0.04    0.09
    0.22        0.22        0.22    0.22
    0.72        0.72        0.73    0.72
    0.39        0.39        0.45    0.64

**y**

status
0
0
1
1
0
0
0
1
0
0
0
1

So my X is all continuous and y is discrete.

There is a parameter in the function to which I can pass the index of discrete features:

sklearn.feature_selection.mutual_info_classif(X, y, discrete_features=’auto’, n_neighbors=3, copy=True, random_state=None)

and I am doing as below:

print(mutual_info_classif(X,y,discrete_features = [3],n_neighbors = 20))
[0.12178862 0.12968448 0.15483147 0.14721018]

Though this is not giving error, I am not sure if I am passing the right index for identifying the y variable as discrete and others as continuous.

Can someone please clarify if I am wrong?

Upvotes: 1

Views: 1081

Answers (2)

Priyanka Mohandas
Priyanka Mohandas

Reputation: 41

The parameter discrete_features is for specifying if you want your features (X) to be considered as discrete or dense (continuous). Y is passed as discrete by default. And since you are finding the MI index of continuous random variables, you should set it to 'auto' for correct results.

Upvotes: 0

Jan K
Jan K

Reputation: 4150

The function mutual_info_classif already assumes your target y is discrete. So no need to pass any index and the following is enough

mutual_info_classif(X, y)

Note that the default discrete_features=’auto’ figures out automatically, that all your features are continuous since X is a dense array.

Also, your example is wrong because feeding discrete_features=[3] will result in the algorithm seeing the 4th feature (prop_3m) as a discrete one.

Upvotes: 2

Related Questions