Reputation: 73
I split the data into test and train sets both of which have the target values '0's and '1's. But after fitting and predicting with SVM the classification report states that there are Zero '0's in the test sample which is not true.
from sklearn.datasets import load_breast_cancer
data = load_breast_cancer()
df = pd.DataFrame(data = data['data'],columns=data['feature_names'])
x = df
y = data['target']
xtrain,xtest,ytrain,ytest
= train_test_split(x,y,test_size=0.3,random_state=42)
As you can see below, the test has 0s and 1s but the support in the classification report states that there aren't any 0s!
!(https://i.sstatic.net/n2uUM.png)
Upvotes: 3
Views: 4537
Reputation: 60318
(It is always a good idea to include your relevant code in the example, and not in images)
the classification report states that there are Zero '0's in the test sample which is not true.
This is because, from your code in the linked image, it turns out that you have switched the arguments in the classification_report
; you have used:
print(classification_report(pred, ytest)) # wrong order of arguments
which indeed gives:
precision recall f1-score support
class 0 0.00 0.00 0.00 0
class 1 1.00 0.63 0.77 171
avg / total 1.00 0.63 0.77 171
but the correct usage (see the docs) is
print(classification_report(ytest, pred)) # ytest first
which gives
precision recall f1-score support
class 0 0.00 0.00 0.00 63
class 1 0.63 1.00 0.77 108
avg / total 0.40 0.63 0.49 171
along with the following warning message:
C:\Users\Root\Anaconda3\envs\tensorflow1\lib\site-packages\sklearn\metrics\classification.py:1135: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. 'precision', 'predicted', average, warn_for)
because, as already pointed out in the comments, you predict only 1's:
pred
# result:
array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])
the reason of which is another story, and not part of the current question.
Here is the complete reproducible code for the above:
from sklearn.metrics import classification_report
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_breast_cancer
X, y = load_breast_cancer(return_X_y=True)
xtrain,xtest,ytrain,ytest = train_test_split(X,y,test_size=0.3,random_state=42)
from sklearn.svm import SVC
svc=SVC()
svc.fit(xtrain, ytrain)
pred = svc.predict(xtest)
print(classification_report(ytest, pred))
Upvotes: 6