Classifier fit and predict on the same data gives different result

Question

I am training a classifier using sklearn and I am doing something wrong. In the code below I put exactly the same values for training and predicting and the results are not the same. How does this happen?

text_clf = Pipeline([('vect', CountVectorizer()),
                     ('tfidf', TfidfTransformer()),
                     ('clf', MultinomialNB()),
])
text_clf = text_clf.fit(X, y)

predicted = text_clf.predict(X)

print set(np.asarray(y)) == set(predicted) #gives false

The data X is a list of unicode and y list of numbers (1 and 0).

Classifier fit and predict on the same data gives different result

Answers (1)

Related Questions