Reputation:
I am trying to implement a lstm model with data measured once a day over time at different moments of the day.
For example, let's say the last input of my data set has been measured on May 16, 2018. My data is like this:
Velocity Time
0 56.122 3600
1 56.114 3601
... ... ...
3599 75.043 7199
The time is in seconds. From 3600 to 7199 means from 1:00 am to 2:00 am.
Let's say the input before has been measured the day before (May 15) from 00:00 to 00:15.
Velocity Time
0 6.232 0
1 6.197 1
... ... ...
899 5.507 899
The problem is I don't know how to deal with the 'Time' feature when I am creating my LSTM model.
At the moment, I have padded my data so that they all have the same shape. For example, for the input of May 15, I have now
Velocity Time
0 6.232 0
1 6.197 1
... ... ...
899 5.507 899
900 -1 -1
... ... ...
3599 -1 -1
I assumed in the example that 1 hour was the maximum time length of the inputs.
Do I need to transform the time into a categorical data ? Because I normalized my data (not seen here) with sklearn.preprocessing.MinMaxScaler (I did it before the padding). If not, do I need to scale the time ?
I have 11200 inputs. Each input (X_train) has a shape of (3600, 2). There is one output for each input (it's a boolean True or False).
Thanks.
Upvotes: 0
Views: 207
Reputation: 11225
You don't need to transform into categorical data. Normalising is a good starting point but having said that if you want you can make the time discrete by rounding it to the nearest hour for example. This way you can categorise it, this of course will change the information the network gets.
Another approach could be to take the differences between events in seconds and normalise that. This way the data wouldn't have bias towards constantly increasing time input.
Upvotes: 1