Classification report changes with each run

Question

I use the code below to get the confusion matrix and the classification report of the classification model, but the result changes with each run! Why does it happen and how can I fix it?

import pandas as pd
import numpy as np
from sklearn.svm import SVC
from sklearn.metrics import classification_report, confusion_matrix

bankdata = pd.read_csv("bill_authentication.csv")

X = bankdata.drop('Class', axis=1)
y = bankdata['Class']

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20)

svclassifier = SVC(kernel='rbf')
svclassifier.fit(X_train, y_train)
y_pred = svclassifier.predict(X_test)

print(confusion_matrix(y_test,y_pred))
print(classification_report(y_test,y_pred))

error · Accepted Answer

You need to set random_states for train_test_split. Without setting it each run gets a different random state. Resulting in different train test splits. Thus resulting in different input for your classifier which can (and in your case is) resulting in differences in outcome.

So for example if you set random_state to a fixed value, you will get the same results between runs so change to this line of code. The precise value you set random_state to does not matter aslong as it is the same between runs.

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=1)

You can also set random_state for SVC but that only plays a part if the probability argument is set to True. Which by default is set to False, so it should not influence your case.

Classification report changes with each run

Answers (1)

Related Questions