Setting correct input for RNN

Question

In a database there are time-series data with records:

device - timestamp - temperature - min limit - max limit
device - timestamp - temperature - min limit - max limit
device - timestamp - temperature - min limit - max limit
...

For every device there are 4 hours of time series data (with an interval of 5 minutes) before an alarm was raised and 4 hours of time series data (again with an interval of 5 minutes) that didn't raise any alarm. This graph describes better the representation of the data, for every device:

I need to use RNN class in python for alarm prediction. We define alarm when the temperature goes below the min limit or above the max limit.

After reading the official documentation from tensorflow here, i'm having troubles understanding how to set the input to the model. Should i normalise the data beforehand or something and if yes how?

Also reading the answers here didn't help me as well to have a clear view on how to transform my data into an acceptable format for the RNN model.

Any help on how the X and Y in model.fit should look like for my case?

If you see any other issue regarding this problem feel free to comment it.

PS. I have already setup python in docker with tensorflow, keras etc. in case this information helps.

roman · Accepted Answer

You can begin with a snippet that you mention in the question.

Any help on how the X and Y in model.fit should look like for my case?

X should be a numpy matrix of shape [num samples, sequence length, D], where D is a number of values per timestamp. I suppose D=1 in your case, because you only pass temperature value.

y should be a vector of target values (as in the snippet). Either binary (alarm/not_alarm), or continuous (e.g. max temperature deviation). In the latter case you'd need to change sigmoid activation for something else.

Should i normalise the data beforehand

Yes, it's essential to preprocess your raw data. I see 2 crucial things to do here:

Normalise temperature values with min-max or standardization (wiki, sklearn preprocessing). Plus, I'd add a bit of smoothing.
Drop some fraction of last timestamps from all of the time-series to avoid information leak.

Finally, I'd say that this task is more complex than it seems to be. You might want to either find a good starter tutorial on time-series classification, or a course on machine learning in general. I believe you can find a better method than RNN.

Setting correct input for RNN

Answers (2)

Related Questions