Reputation: 5173
I'm build a model clf
say
clf = MultinomialNB()
clf.fit(x_train, y_train)
then I want to see my model accuracy using score
clf.score(x_train, y_train)
the result was 0.92
My goal is to test against the test so I use
clf.score(x_test, y_test)
This one I got 0.77
, so I thought it would give me the result same as this code below
clf.fit(X_train, y_train).score(X_test, y_test)
This I got 0.54
. Can someone help me understand why would 0.77 > 0.54
?
Upvotes: 6
Views: 25334
Reputation: 26572
You must get the same result if x_train
, y_train
, x_test
and y_test
are the same in both cases. Here is an example using iris dataset, as you can see both methods get the same result.
>>> from sklearn.naive_bayes import MultinomialNB
>>> from sklearn.cross_validation import train_test_split
>>> from sklearn.datasets import load_iris
>>> from copy import copy
# prepare dataset
>>> iris = load_iris()
>>> X = iris.data[:, :2]
>>> y = iris.target
>>> X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
# model
>>> clf1 = MultinomialNB()
>>> clf2 = MultinomialNB()
>>> print id(clf1), id(clf2) # two different instances
4337289232 4337289296
>>> clf1.fit(X_train, y_train)
>>> print clf1.score(X_test, y_test)
0.633333333333
>>> print clf2.fit(X_train, y_train).score(X_test, y_test)
0.633333333333
Upvotes: 7