Reputation: 2628
I am building a program that assigns multiple labels/tags to textual descriptions. I am using the OneVsRestClassifier to label my textual descriptions. xTrain, xTest, and yTrain are all 'numpy.ndarray'
. This does seem strange considering that I have splitting the training and test data in the correct manner. Below is my code:
xTrain, xTest, yTrain, yTest = train_test_split(x, y, test_size=0.2)
nb_clf = MultinomialNB()
sgd = SGDClassifier()
lr = LogisticRegression()
mn = MultinomialNB()
print("xTrain.shape = " + str(xTrain.shape))
print("xTest.shape = " + str(xTest.shape))
print("yTrain.shape = " + str(yTrain.shape))
print("yTest.shape = " + str(yTest.shape))
print("type(xTrain) = " + str(type(xTrain)))
print("type(xTest) = " + str(type(xTest)))
xTrain = csr_matrix(xTrain).toarray()
xTest = csr_matrix(xTest).toarray()
yTrain = csr_matrix(yTrain).toarray()
print("type(xTrain) = " + str(type(xTrain)))
for classifier in [nb_clf, sgd, lr, mn]:
clf = OneVsRestClassifier(classifier)
clf.fit(xTrain.astype("U"), yTrain.astype("U"))
y_pred = clf.predict(xTest)
print("\ny_pred:")
print(y_pred)
x output:
(1466, 1292) 0.13531037414782607
(1466, 1238) 0.21029405543816293
(1466, 988) 0.04688335706505732
...
...
y ouput:
[[0 0 0 ... 1 0 0]
[0 0 0 ... 0 0 0]
[0 0 0 ... 1 0 0]
...
[0 0 0 ... 0 0 0]
[0 0 0 ... 0 0 0]
[0 0 0 ... 0 0 0]]
print statements output:
xTrain.shape = (1173, 13817)
xTest.shape = (294, 13817)
yTrain.shape = (1173, 28)
yTest.shape = (294, 28)
type(xTrain) = <class 'scipy.sparse.csr.csr_matrix'>
type(xTest) = <class 'scipy.sparse.csr.csr_matrix'>
type(xTrain) = <class 'numpy.ndarray'>
type(xTest) = <class 'numpy.ndarray'>
type(yTrain) = <class 'numpy.ndarray'>
error (at the clf.fit line):
ValueError: Multioutput target data is not supported with label binarization
Upvotes: 0
Views: 2368
Reputation: 126
Please first clarify the feature dimension as well as sample size in your program. For the target feature (y
), the label should not be one-hot encoded. For example, instead of [0 0 0 1], it should be [3]
Upvotes: 1