user8530765
user8530765

Reputation:

keras lstm time series data

I am trying to implement a lstm model with data measured once a day over time at different moments of the day.

For example, let's say the last input of my data set has been measured on May 16, 2018. My data is like this:

        Velocity        Time 
0        56.122         3600
1        56.114         3601
...      ...            ...
3599     75.043         7199

The time is in seconds. From 3600 to 7199 means from 1:00 am to 2:00 am.

Let's say the input before has been measured the day before (May 15) from 00:00 to 00:15.

        Velocity        Time 
0        6.232           0
1        6.197           1
...      ...             ...
899      5.507           899

The problem is I don't know how to deal with the 'Time' feature when I am creating my LSTM model.

At the moment, I have padded my data so that they all have the same shape. For example, for the input of May 15, I have now

        Velocity        Time 
0        6.232           0
1        6.197           1
...      ...             ...
899      5.507           899
900      -1              -1
...      ...             ...
3599     -1              -1

I assumed in the example that 1 hour was the maximum time length of the inputs.

Do I need to transform the time into a categorical data ? Because I normalized my data (not seen here) with sklearn.preprocessing.MinMaxScaler (I did it before the padding). If not, do I need to scale the time ?

I have 11200 inputs. Each input (X_train) has a shape of (3600, 2). There is one output for each input (it's a boolean True or False).

Thanks.

Upvotes: 0

Views: 207

Answers (1)

nuric
nuric

Reputation: 11225

You don't need to transform into categorical data. Normalising is a good starting point but having said that if you want you can make the time discrete by rounding it to the nearest hour for example. This way you can categorise it, this of course will change the information the network gets.

Another approach could be to take the differences between events in seconds and normalise that. This way the data wouldn't have bias towards constantly increasing time input.

Upvotes: 1

Related Questions