Reputation: 1048
I am using MI from sklearn.feature_selection.mutual_info_classif to calculate MI between 4 continuous variables(X matrix) and y(target class)
X:
prop_tenure prop_12m prop_6m prop_3m
0.04 0.04 0.06 0.08
0 0 0 0
0 0 0 0
0.06 0.06 0.1 0
0.38 0.38 0.25 0
0.61 0.61 0.66 0.61
0.01 0.01 0.02 0.02
0.1 0.1 0.12 0.16
0.04 0.04 0.04 0.09
0.22 0.22 0.22 0.22
0.72 0.72 0.73 0.72
0.39 0.39 0.45 0.64
**y**
status
0
0
1
1
0
0
0
1
0
0
0
1
So my X is all continuous and y is discrete.
There is a parameter in the function to which I can pass the index of discrete features:
sklearn.feature_selection.mutual_info_classif(X, y, discrete_features=’auto’, n_neighbors=3, copy=True, random_state=None)
and I am doing as below:
print(mutual_info_classif(X,y,discrete_features = [3],n_neighbors = 20))
[0.12178862 0.12968448 0.15483147 0.14721018]
Though this is not giving error, I am not sure if I am passing the right index for identifying the y variable as discrete and others as continuous.
Can someone please clarify if I am wrong?
Upvotes: 1
Views: 1081
Reputation: 41
The parameter discrete_features is for specifying if you want your features (X) to be considered as discrete or dense (continuous). Y is passed as discrete by default. And since you are finding the MI index of continuous random variables, you should set it to 'auto' for correct results.
Upvotes: 0
Reputation: 4150
The function mutual_info_classif
already assumes your target y
is discrete. So no need to pass any index and the following is enough
mutual_info_classif(X, y)
Note that the default discrete_features=’auto’
figures out automatically, that all your features are continuous since X
is a dense array.
Also, your example is wrong because feeding discrete_features=[3]
will result in the algorithm seeing the 4th feature (prop_3m) as a discrete one.
Upvotes: 2