Reputation: 507
I am working on feature selection from the NSL-KDD dataset. After preprocessing, my X-DoS has type of data like this:
type_of_target(X_newDoS)
'continuous-multioutput'
and Y_DoS as
type_of_target(Y_DoS)
'unkonwn'
I run the feature selection part as:
from sklearn.feature_selection import RFE
from sklearn.ensemble import RandomForestClassifier
clf =RandomForestClassifier( n_jobs = 2)
rfe = RFE(clf, n_features_to_select=1)
rfe.fit(X_newDoS, Y_DoS)
The error message:
ValueError Traceback (most recent call
last)
<ipython-input-31-6c22f9cc2bba> in <module>()
12 rfe = RFE(clf, n_features_to_select=1)
---> 13 rfe.fit(X_newDoS, Y_DoS)
14
4 frames
/usr/local/lib/python3.6/dist-packages/sklearn/utils/multiclass.py in
check_classification_targets(y)
167 if y_type not in ['binary', 'multiclass', 'multiclass-
multioutput',
168 'multilabel-indicator', 'multilabel-
sequences']:
--> 169 raise ValueError("Unknown label type: %r" % y_type)
170
ValueError: Unknown label type: 'unknown'
X_newDoS is a numpy array and Y_DoS is an array of dimension (125972,2). Clicking on the multiclass.py file, I saw there was no 'unknown' type in the list. I tried to convert the Y_DoS array into a numpy array with:
Y_DoS = np.array(Y_DoS)
Still it is an unknown data type and can't be recognized by the multiclass.py file. What are the ways I can solve this problem? How do I make the Y_DoS variable to another type recognizable by multiclass.py file without losing its contents and structures? For reference I used the code from this link and have done the same steps for preprocessing. https://github.com/CynthiaKoopman/Network-Intrusion-Detection/blob/master/DecisionTree_IDS.ipynb
I am pretty new to machine learning. The program worked fine with numpy 1.11.3, sklearn 0.18.1 and pandas 1.19.2. When working with the current preinstalled libraries versions of colab (numpy 0.24.2, sklearn 1.16.3, pandas 0.21.1), it raises the error mentioned above.
Upvotes: 3
Views: 4260
Reputation: 507
Nevermind. It seems the Y_DoS variable happened to be an undefined object, so sklearn could not recognize its type. Adding
Y_DoS = Y_DoS.astype('int')
before learning step solved the problem and classified Y_DoS as 'binary' type.
Upvotes: 3