Reputation: 1
I have this python code that predicts the trade calls with the Bollinger band values and the Close Price.
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
lr1 = LogisticRegression()
x = df[['Lower_Band','Upper_Band','MA_14','Close Price']]
y = df['Call']
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.3)
lr1.fit(x_train,y_train)
y_pred = lr1.predict(x_test)
print("Accuracy=",accuracy_score(y_test,y_pred,normalize=True))
Upvotes: 0
Views: 882
Reputation: 5164
As described by KolaB, you should use the random_state
parameter of train_test_split
to make results reproducible. But actually, you mentioned that your results vary between 0.3 and 0.8 in accuracy score. This is a strong indicator that your results depend on a particular random choice for the test set. I would, therefore, suggest to use k-fold cross-validation as a countermeasure.
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import cross_val_score
lr1 = LogisticRegression()
x = df[['Lower_Band','Upper_Band','MA_14','Close Price']]
y = df['Call']
print(f'Accuracy = {mean(cross_val_score(lr1, x, y, cv=5))}')
The example returns an array for 5 train/test iterations so that each sample was used in the test set once. By getting the average of these 5 runs, you get a better estimate of your model's performance.
Upvotes: 1
Reputation: 501
Your problem is most probably in train_test_split
. You are not initialising the random state that ensures you get reproducible results. Try changing the line with this function to:
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.3, random_state=1)
Also see scikit learn documentation on the train_test_split function
Upvotes: 0