Reputation: 1262
In a database there are time-series data with records:
device
- timestamp
- temperature
- min limit
- max limit
device
- timestamp
- temperature
- min limit
- max limit
device
- timestamp
- temperature
- min limit
- max limit
For every device
there are 4 hours of time series data (with an interval of 5 minutes) before an alarm was raised and 4 hours of time series data (again with an interval of 5 minutes) that didn't raise any alarm. This graph describes better the representation of the data, for every device
:
I need to use RNN class in python for alarm prediction. We define alarm when the temperature
goes below the min limit
or above the max limit
.
After reading the official documentation from tensorflow here, i'm having troubles understanding how to set the input to the model. Should i normalise the data beforehand or something and if yes how?
Also reading the answers here didn't help me as well to have a clear view on how to transform my data into an acceptable format for the RNN model.
Any help on how the X
and Y
in model.fit
should look like for my case?
If you see any other issue regarding this problem feel free to comment it.
PS. I have already setup python
in docker
with tensorflow
, keras
etc. in case this information helps.
Upvotes: 7
Views: 985
Reputation: 91
Yes you should normalize your data. I would look at differencing by every day. Aka difference interval is 24hours / 5 minutes. You can also try and yearly difference but that depends on your choice in window size(remember RNNs dont do well with large windows). You may possibly want to use a log-transformation like the above user said but also this seems to be somewhat stationary so I could also see that not being needed.
For your model.fit, you are technically training the equivelant of a language model, where you predict the next output. SO your inputs will be the preciding x values and preceding normalized y values of whatever window size you choose, and your target value will be the normalized output at a given time step t. Just so you know a 1-D Conv Net is good for classification but good call on the RNN because of the temporal aspect of temperature spikes.
Once you have trained a model on the x values and normalized y values and can tell that it is actually learning (converging) then you can actually use the model.predict with the preciding x values and preciding normalized y values. Take the output and un-normalize it to get an actual temperature value or just keep the normalized value and feed it back into the model to get the time+2 prediction
Upvotes: 0
Reputation: 1091
You can begin with a snippet that you mention in the question.
Any help on how the X and Y in model.fit should look like for my case?
X
should be a numpy matrix of shape [num samples, sequence length, D]
, where D
is a number of values per timestamp. I suppose D=1
in your case, because you only pass temperature value.
y
should be a vector of target values (as in the snippet). Either binary (alarm/not_alarm), or continuous (e.g. max temperature deviation). In the latter case you'd need to change sigmoid activation for something else.
Should i normalise the data beforehand
Yes, it's essential to preprocess your raw data. I see 2 crucial things to do here:
Finally, I'd say that this task is more complex than it seems to be. You might want to either find a good starter tutorial on time-series classification, or a course on machine learning in general. I believe you can find a better method than RNN.
Upvotes: 2