Mislav
Mislav

Reputation: 1563

0 accuracy with LSTM

I trained LSTM classification model, but got weird results (0 accuracy). Here is my dataset with preprocessing steps:

import pandas as pd
from sklearn.model_selection import train_test_split
import tensorflow as tf
from tensorflow import keras
import numpy as np

url = 'https://raw.githubusercontent.com/MislavSag/trademl/master/trademl/modeling/random_forest/X_TEST.csv'
X_TEST = pd.read_csv(url, sep=',')
url = 'https://raw.githubusercontent.com/MislavSag/trademl/master/trademl/modeling/random_forest/labeling_info_TEST.csv'
labeling_info_TEST = pd.read_csv(url, sep=',')


# TRAIN TEST SPLIT
X_train, X_test, y_train, y_test = train_test_split(
    X_TEST.drop(columns=['close_orig']), labeling_info_TEST['bin'],
    test_size=0.10, shuffle=False, stratify=None)


### PREPARE LSTM
x = X_train['close'].values.reshape(-1, 1)
y = y_train.values.reshape(-1, 1)
x_test = X_test['close'].values.reshape(-1, 1)
y_test = y_test.values.reshape(-1, 1)
train_val_index_split = 0.75
train_generator = keras.preprocessing.sequence.TimeseriesGenerator(
    data=x,
    targets=y,
    length=30,
    sampling_rate=1,
    stride=1,
    start_index=0,
    end_index=int(train_val_index_split*X_TEST.shape[0]),
    shuffle=False,
    reverse=False,
    batch_size=128
)
validation_generator = keras.preprocessing.sequence.TimeseriesGenerator(
    data=x,
    targets=y,
    length=30,
    sampling_rate=1,
    stride=1,
    start_index=int((train_val_index_split*X_TEST.shape[0] + 1)),
    end_index=None,  #int(train_test_index_split*X.shape[0])
    shuffle=False,
    reverse=False,
    batch_size=128
)
test_generator = keras.preprocessing.sequence.TimeseriesGenerator(
    data=x_test,
    targets=y_test,
    length=30,
    sampling_rate=1,
    stride=1,
    start_index=0,
    end_index=None,
    shuffle=False,
    reverse=False,
    batch_size=128
)

# convert generator to inmemory 3D series (if enough RAM)
def generator_to_obj(generator):
    xlist = []
    ylist = []
    for i in range(len(generator)):
        x, y = train_generator[i]
        xlist.append(x)
        ylist.append(y)
    X_train = np.concatenate(xlist, axis=0)
    y_train = np.concatenate(ylist, axis=0)
    return X_train, y_train

X_train_lstm, y_train_lstm = generator_to_obj(train_generator)
X_val_lstm, y_val_lstm = generator_to_obj(validation_generator)
X_test_lstm, y_test_lstm = generator_to_obj(test_generator)

# test for shapes
print('X and y shape train: ', X_train_lstm.shape, y_train_lstm.shape)
print('X and y shape validate: ', X_val_lstm.shape, y_val_lstm.shape)
print('X and y shape test: ', X_test_lstm.shape, y_test_lstm.shape)

and here is my model with resuslts:

### MODEL
model = keras.models.Sequential([
        keras.layers.LSTM(124, return_sequences=True, input_shape=[None, 1]),
        keras.layers.LSTM(258),
        keras.layers.Dense(1, activation='sigmoid')
])
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
history = model.fit(X_train_lstm, y_train_lstm, epochs=10, batch_size=128,
                    validation_data=[X_val_lstm, y_val_lstm])
# history = model.fit_generator(train_generator, epochs=40, validation_data=validation_generator, verbose=1)
score, acc = model.evaluate(X_val_lstm, y_val_lstm,
                            batch_size=128)

historydf = pd.DataFrame(history.history)
historydf.head(10)

Why do I get 0 accuracy?

Upvotes: 8

Views: 703

Answers (1)

Zabir Al Nazi Nabil
Zabir Al Nazi Nabil

Reputation: 11198

You're using sigmoid activation, which means your labels must be in range 0 and 1. But in your case, the labels are 1. and -1.

Just replace -1 with 0.

for i, y in enumerate(y_train_lstm):
    if y == -1.:
        y_train_lstm[i,:] = 0. 
for i, y in enumerate(y_val_lstm):
    if y == -1.:
        y_val_lstm[i,:] = 0. 

for i, y in enumerate(y_test_lstm):
    if y == -1.:
        y_test_lstm[i,:] = 0. 

Sidenote:

enter image description here

The signals are very close, it would be hard to distinguish them. So, probably accuracy won't be high with simple models.

After training with 0. and 1. labels,

model = keras.models.Sequential([
        keras.layers.LSTM(124, return_sequences=True, input_shape=(30, 1)),
        keras.layers.LSTM(258),
        keras.layers.Dense(1, activation='sigmoid')
])
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
history = model.fit(X_train_lstm, y_train_lstm, epochs=5, batch_size=128,
                    validation_data=(X_val_lstm, y_val_lstm))
# history = model.fit_generator(train_generator, epochs=40, validation_data=validation_generator, verbose=1)
score, acc = model.evaluate(X_val_lstm, y_val_lstm,
                            batch_size=128)

historydf = pd.DataFrame(history.history)
historydf.head(10)
Epoch 1/5
12/12 [==============================] - 5s 378ms/step - loss: 0.7386 - accuracy: 0.4990 - val_loss: 0.6959 - val_accuracy: 0.4896
Epoch 2/5
12/12 [==============================] - 4s 318ms/step - loss: 0.6947 - accuracy: 0.5133 - val_loss: 0.6959 - val_accuracy: 0.5104
Epoch 3/5
12/12 [==============================] - 4s 318ms/step - loss: 0.6941 - accuracy: 0.4895 - val_loss: 0.6930 - val_accuracy: 0.5104
Epoch 4/5
12/12 [==============================] - 4s 332ms/step - loss: 0.6946 - accuracy: 0.5269 - val_loss: 0.6946 - val_accuracy: 0.5104
Epoch 5/5
12/12 [==============================] - 4s 334ms/step - loss: 0.6931 - accuracy: 0.4901 - val_loss: 0.6929 - val_accuracy: 0.5104
3/3 [==============================] - 0s 73ms/step - loss: 0.6929 - accuracy: 0.5104

    loss    accuracy    val_loss    val_accuracy
0   0.738649    0.498980    0.695888    0.489583
1   0.694708    0.513256    0.695942    0.510417
2   0.694117    0.489463    0.692987    0.510417
3   0.694554    0.526852    0.694613    0.510417
4   0.693118    0.490143    0.692936    0.510417

Source code in colab: https://colab.research.google.com/drive/10yRf4TfGDnp_4F2HYoxPyTlF18no-8Dr?usp=sharing

Upvotes: 7

Related Questions