Borislav Stoilov
Borislav Stoilov

Reputation: 3697

LSTM Model accuracy caps and I can't improve it

I am trying to do a proof of concept LSTM model for forex prediction.

After lots of reading I came up with the following model (I believe its called stacked)

model = Sequential()
model.add(LSTM(64, return_sequences=True, input_shape=(None, x_train.shape[2])))
model.add(LSTM(128, return_sequences=True))
model.add(LSTM(64, return_sequences=True))
model.add(LSTM(n_features, return_sequences=True))

model.compile(loss='mean_squared_error', optimizer='adam')
model.fit(x_train, y_train, epochs=100, batch_size=1, verbose=2, validation_data=(x_test, y_test))

And here is the actual (blue) vs predicted (yellow) data

prediction accuracy

Everything else I did performed worse.

Loss stops to improve after around epoch 70. And after that further training has no effect. I use MinMaxScaler on the data

self.scaler = MinMaxScaler(feature_range=(0, 1))
self.scaler = self.scaler.fit(self.raw)
self.raw = self.scaler.transform(self.raw)

Without scaling the differences in the prediction become so small that it looks like a straight horizontal line.

Is there anything I can do to improve the model? How do I chose the right number of LSTM layers and hidden layer size for each one of them. Tried adding Dropout layers, as several online resources suggested and there wasn't any improvement.

If I need to provide other parts of the code just let me know.

Upvotes: 1

Views: 694

Answers (1)

jlh
jlh

Reputation: 4717

Having been there and tried that as well, I can comment on a few points. But in general, I believe this is a really hard task to solve.

  • Is there a reason why you train with a batch size of 1? I have seen papers arguing that small batch sizes (down to 2) can be useful, but it's not common practice. Try a batch size of 32, which is a good starting point.

  • What is the value of n_features and what kind of features are you expecting the model to produce? Is it just a single value which is the price at the next step? Perhaps also consider an alternate approach: Instead of expecting the model to predict the precise future price, make it only predict whether it goes up or down for the next step and then use a categorical cross-entropy loss. There are many other potential variants, I once experimented with a model that directly outputs actions "sell", "buy", "do nothing".

  • What exactly is x_train.shape[2] in your setup? Is it 1 and it's just the something like the average price over a fixed time interval (like 10 minutes)? It would probably help to provide additional data to the model. For example, for each 10 minutes interval you could provide:

    • Average price
    • Highest price
    • Lowest price
    • Trade volume
    • Current price / previous price
    • Time of day
    • Day of week
    • Date of year
    • Price of a related currency/stock/whatever
    • anything else you can think of

    Note: Periodic data (like time of day) is probably best input into the model as a sin(t), cos(t) pair.

  • The MinMaxScaler (or similar) is absolutely necessary because you cannot feed raw values (like 100) into a model, they just can't deal with that. You need to rescale the values into a range that the model can deal with (usually 0–1 or -1–1). This is called normalizing the data. Make sure you apply the exact same scaling to all datasets that you work with.

Upvotes: 2

Related Questions