Reputation: 175
I use the code below to get the confusion matrix
and the classification report
of the classification model, but the result changes with each run! Why does it happen and how can I fix it?
import pandas as pd
import numpy as np
from sklearn.svm import SVC
from sklearn.metrics import classification_report, confusion_matrix
bankdata = pd.read_csv("bill_authentication.csv")
X = bankdata.drop('Class', axis=1)
y = bankdata['Class']
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20)
svclassifier = SVC(kernel='rbf')
svclassifier.fit(X_train, y_train)
y_pred = svclassifier.predict(X_test)
print(confusion_matrix(y_test,y_pred))
print(classification_report(y_test,y_pred))
Upvotes: 1
Views: 523
Reputation: 2471
You need to set random_states
for train_test_split
. Without setting it each run gets a different random state. Resulting in different train test splits. Thus resulting in different input for your classifier which can (and in your case is) resulting in differences in outcome.
So for example if you set random_state to a fixed value, you will get the same results between runs so change to this line of code. The precise value you set random_state
to does not matter aslong as it is the same between runs.
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=1)
You can also set random_state for SVC
but that only plays a part if the probability
argument is set to True
. Which by default is set to False
, so it should not influence your case.
Upvotes: 2