Ok_good
Ok_good

Reputation: 1

Why does machine learning model keep on giving different accuracy values each time?

I have this python code that predicts the trade calls with the Bollinger band values and the Close Price.

from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

lr1 = LogisticRegression()
x = df[['Lower_Band','Upper_Band','MA_14','Close Price']]
y = df['Call']
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.3)
lr1.fit(x_train,y_train)
y_pred = lr1.predict(x_test)
print("Accuracy=",accuracy_score(y_test,y_pred,normalize=True))
Each time I run this code, different accuracy values are printed. The accuracy values range everything from 0.3 to 0.8. So how do I predict the accuracy of this model? Is there something wrong in my code?

Upvotes: 0

Views: 882

Answers (2)

afsharov
afsharov

Reputation: 5164

As described by KolaB, you should use the random_state parameter of train_test_split to make results reproducible. But actually, you mentioned that your results vary between 0.3 and 0.8 in accuracy score. This is a strong indicator that your results depend on a particular random choice for the test set. I would, therefore, suggest to use k-fold cross-validation as a countermeasure.

from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import cross_val_score

lr1 = LogisticRegression()
x = df[['Lower_Band','Upper_Band','MA_14','Close Price']]
y = df['Call']
print(f'Accuracy = {mean(cross_val_score(lr1, x, y, cv=5))}')

The example returns an array for 5 train/test iterations so that each sample was used in the test set once. By getting the average of these 5 runs, you get a better estimate of your model's performance.

Upvotes: 1

KolaB
KolaB

Reputation: 501

Your problem is most probably in train_test_split. You are not initialising the random state that ensures you get reproducible results. Try changing the line with this function to:

x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.3, random_state=1)

Also see scikit learn documentation on the train_test_split function

Upvotes: 0

Related Questions