appleii2
appleii2

Reputation: 13

Preparing Time-Series Data for Keras LSTM - Network Trains with Extremely High Loss

I am running into issues preparing my data for use in Keras's LSTM layer. The data is a 1,600,000 item time-series csv consisting of a date and three features:

Date F1 F2 F3 2016-03-01 .252 .316 .690 2016-03-02 .276 .305 .691 2016-03-03 .284 .278 .687 ... My goal is to predict the value of F1 prediction_period timesteps in the future. Understanding that Keras's LSTM layer takes import data in the format (samples,timesteps,dimensions) I wrote the following function to convert my data into a 3D numpy array in this format (Using 2016-03-03 as an example):

[[[.284, .278, .687], [.276, .305, .691], [.252, .316, .690]],...other samples...]

This function creates the array by stacking copies of the data, with each copy shifted one step further back in time. Lookback is the number of "layers" in the stack and trainpercent is train/test split:

def loaddata(path):
    df = pd.read_csv(path)
    df.drop(['Date'], axis=1, inplace=True)
    df['label'] = df.F1.shift(periods=-prediction_period)
    df.dropna(inplace=True)

    df_train, df_test = df.iloc[:int(trainpercent * len(df))], df.iloc[int(trainpercent * len(df)):]
    train_X, train_Y = df_train.drop('label', axis=1).copy(), df_train[['label']].copy()
    test_X, test_Y = df_test.drop('label', axis=1).copy(), df_test[['label']].copy()
    train_X, train_Y, test_X, test_Y = train_X.as_matrix(), train_Y.as_matrix(), test_X.as_matrix(), test_Y.as_matrix()
    train_X, train_Y, test_X, test_Y = train_X.astype('float32'), train_Y.astype('float32'), test_X.astype('float32'), test_Y.astype('float32')

    train_X, test_X = stackit(train_X), stackit(test_X)
    train_X, test_X = train_X[:, lookback:, :], test_X[:, lookback:, :]
    train_Y, test_Y = train_Y[lookback:, :], test_Y[lookback:, :]

    train_X = np.reshape(train_X, (train_X.shape[1], train_X.shape[0], train_X.shape[2]))
    test_X = np.reshape(test_X, (test_X.shape[1], test_X.shape[0], test_X.shape[2]))
    train_Y, test_Y = np.reshape(train_Y, (train_Y.shape[0])),  np.reshape(test_Y, (test_Y.shape[0]))
    return train_X, train_Y, test_X, test_Y

def stackit(thearray):
    thelist = []
    for i in range(lookback):
        thelist.append(np.roll(thearray, shift=i, axis=0))
    thelist = tuple(thelist)
    thestack = np.stack(thelist)
    return thestack

While the network accepted the data and did train, the loss values were exceptionally high, which was very surprising considering that the data has a definite periodic trend. To try and isolate the problem, I replaced my dataset and network structure with a sin-wave dataset and structure from this example: http://www.jakob-aungiers.com/articles/a/LSTM-Neural-Network-for-Time-Series-Prediction.

Even with the sin wave dataset, the loss was still orders of magnitude higher that were produced by the example function. I went through the function piece by piece, using a one column sequential dataset and compared expected values with the actual values. I didn't find any errors.

Am I structuring my input data incorrectly for Keras's LSTM layer? If so, what is the proper way to do this? If not, what would you expect to cause these symptoms (extremely high loss which does not decrease over time, even with 40+ epochs) in my function or otherwise.

Thanks in advance for any advice you can provide!

Upvotes: 1

Views: 1108

Answers (2)

Vyom Sharma
Vyom Sharma

Reputation: 75

Here are some things you can do to improve your predictions:

  1. First make sure you input data is centered i.e. apply some standardization or normalization. You can either use the MinMaxScaler or StandardScaler from sklearn library or implement some custom scaling based on your data.

  2. Make sure your network(LSTM/GRU/RNN) is big enough to capture the complexity in your data.

  3. Use the tensorboard callback in Keras to monitor your weight matrices and loss functions.

  4. Use an adaptive optimizer instead of setting custom learning parameters. Maybe'adam' or 'adagrad' .

Using these will at least make sure that your network is training. You should see gradual decrease of losses over time. After you've solved this problem you are free to experiment with your initial hyper-parameters and implementing different regularization techniques

Good Luck !

Upvotes: 1

Nassim Ben
Nassim Ben

Reputation: 11553

A "high loss" is a very subjective thing. We can not assess this without seeing your model.

It can come from multiple reasons:

  • training loss can be influenced by regularization techniques. For example, the whole point of L2 regularization is to add the weights of the model in the loss.
  • the loss is defined by an objective function, so it depends on what objective you are using.
  • the optmizer you are using for that objective function might not be adapted. Some optimizers do not garantee convergence of the loss.
  • your time serie might not be predictable (but apparently this is not your case).
  • your model might not be adequate for the task you are trying to achieve.
  • your training data is not correctly prepared (but you have investigated this)

You see that there are plenty of possibilities. A high loss doesn't mean anything in itself. You can have a really small loss and just do + 1000 and your loss will be high eventhough the problem is solved

Upvotes: 0

Related Questions