Reputation: 297

Do I need a stateful or stateless LSTM?

I am trying to make an LSTM for time series prediction in Keras. In particular, it should predict unseen values once the model is trained. A visualisation of the time series is shown below.

The model is trained on the blue time series, and predictions are compared to the orange time series.

For predicting, I want to take the last n points of the training data (where n is the sequence length), run a prediction, and use this prediction for a consecutive (second) prediction, ie:

prediction(t+1) = model(obs(t-1), obs(t-2), ..., obs(t-n))
prediction(t+2) = model(prediction(t+1), obs(t-1), ..., obs(t-n))

I have tried to get this to work, but so far without success. I am at a loss if I should use a stateful or stateless model, and what a good value for the sequence length could be. Does anyone have experience with this?

I have read and tried various tutorials, but none seen to applicable to my kind of data.

Because I want to run consecutive predictions, I would need a stateful model to prevent keras resetting states after each call to model.predict, but training with a batch size of 1 takes forever... Or is there a way to circumvent this problem?

Upvotes: 6

Answers (2)

BigBadMe

Reputation: 1852

Stateful LSTM is used when the whole sequence plays a part in forming the output. Taking an extreme case; you might have 1000-length sequence, and the very first character of that sequence is what actually defines the output:

Stateful If you were to batch this into 10 x 100 length sequences, then with stateful LSTM the connections (state) between sequences in the batch would be retained and it would (with enough examples) learn the relationship of the first character plays significant importance to the output. In effect, sequence length is immaterial because the network's state is persisted across the whole stretch of data, you simply batch it as a means of supplying the data.

Stateless During training, the state is reset after each sequence. So in the example I've given, the network wouldn't learn that it's the first character of the 1000-length sequences that defines the output, because it would never see the long-term dependency because the first character and the final output value are in separate sequences, and the state isn't retained between the sequences.

Summary What you need to determine is whether there is likely to be dependency of data at the end of your time-series being affected by what potentially happened right at the start.

I would say that it's actually quite rare that there are such long-term dependencies like that, and what you're probably better doing is using a stateless LSTM, but setting sequence length as a hyperparameter to find which sequences length best models the data, i.e. provides the most accurate validation data.

Upvotes: 9

jorism1993

Reputation: 297

class LSTMNetwork(object):

def __init__(self, hidden_dim1, hidden_dim2, batch_size, seq_size):

    super(LSTMNetwork, self).__init__()

    self.model = self.build_model(hidden_dim1, hidden_dim2, batch_size, seq_size)

    self.hidden_dim1 = hidden_dim1
    self.hidden_dim2 = hidden_dim2
    self.batch_size = batch_size
    self.seq_size = seq_size

def build_model(self, hidden_dim1, hidden_dim2, batch_size, seq_size):
    """
    Build and return the model
    """
    # Define the model
    model = Sequential()

    # First LSTM and dropout layer
    model.add(LSTM(input_shape=(seq_size,1), output_dim=hidden_dim1, return_sequences=True))
    #model.add(Dropout(0.2))

    # Second LSTM and dropout layer
    model.add(LSTM(hidden_dim2, return_sequences=False))
    model.add(Dense(1))
    #model.add(Dropout(0.2))

    # Fully connected layer, with linear activation
    model.add(Activation("linear"))

    model.compile(loss="mean_squared_error", optimizer="adam")

    return model

def predict(self, x):
    """
    Given a vector of x, predict the output
    """
    out = self.model.predict(x)
    return out

def train_model(self, x, y, num_epochs):

    self.model.fit(x, y, epochs=num_epochs, batch_size=self.batch_size)

def predict_sequence(self, x, n, seq_size):
    """
    Given a sequence of [num_samples x seq_size x num_features], predict the next n values
    """

    curr_window = x[-1, :, :]

    predicted = []

    for i in range(n):
        predicted.append(self.predict(curr_window[np.newaxis, :, :])[0,0])
        curr_window = curr_window[1:]
        curr_window = np.insert(curr_window, [seq_size-1], predicted[-1], axis=0)

    return predicted

def preprocess_data(self, data, seq_size):
    """
    Generate training and target samples in a sliding window fashion. 
    Training samples are of size [num_samples x seq_size x num_features]
    Target samples are of size [num_samples, ]
    """
    x = []
    y = []

    for i in range(len(data) - seq_size-1):
        window = data[i:(i+seq_size)]

        after_window = data[i+seq_size]
        window = [[x] for x in window]

        x.append(window)
        y.append(after_window)

    x = np.array(x)
    y = np.array(y)

    return x, y

This predicts a straight line after training, when taking the last row of the training set as input and running predict_sequence on that. Could this be because the states of the model are reset after each call to model.predict()?

Upvotes: 0

Do I need a stateful or stateless LSTM?

Answers (2)

Related Questions