GeorgeGeorgitsis
GeorgeGeorgitsis

Reputation: 1262

Setting correct input for RNN

In a database there are time-series data with records:

For every device there are 4 hours of time series data (with an interval of 5 minutes) before an alarm was raised and 4 hours of time series data (again with an interval of 5 minutes) that didn't raise any alarm. This graph describes better the representation of the data, for every device:

enter image description here

I need to use RNN class in python for alarm prediction. We define alarm when the temperature goes below the min limit or above the max limit.

After reading the official documentation from tensorflow here, i'm having troubles understanding how to set the input to the model. Should i normalise the data beforehand or something and if yes how?

Also reading the answers here didn't help me as well to have a clear view on how to transform my data into an acceptable format for the RNN model.

Any help on how the X and Y in model.fit should look like for my case?

If you see any other issue regarding this problem feel free to comment it.

PS. I have already setup python in docker with tensorflow, keras etc. in case this information helps.

Upvotes: 7

Views: 985

Answers (2)

Drew
Drew

Reputation: 91

Yes you should normalize your data. I would look at differencing by every day. Aka difference interval is 24hours / 5 minutes. You can also try and yearly difference but that depends on your choice in window size(remember RNNs dont do well with large windows). You may possibly want to use a log-transformation like the above user said but also this seems to be somewhat stationary so I could also see that not being needed.

For your model.fit, you are technically training the equivelant of a language model, where you predict the next output. SO your inputs will be the preciding x values and preceding normalized y values of whatever window size you choose, and your target value will be the normalized output at a given time step t. Just so you know a 1-D Conv Net is good for classification but good call on the RNN because of the temporal aspect of temperature spikes.

Once you have trained a model on the x values and normalized y values and can tell that it is actually learning (converging) then you can actually use the model.predict with the preciding x values and preciding normalized y values. Take the output and un-normalize it to get an actual temperature value or just keep the normalized value and feed it back into the model to get the time+2 prediction

Upvotes: 0

roman
roman

Reputation: 1091

You can begin with a snippet that you mention in the question.

Any help on how the X and Y in model.fit should look like for my case?

X should be a numpy matrix of shape [num samples, sequence length, D], where D is a number of values per timestamp. I suppose D=1 in your case, because you only pass temperature value.

y should be a vector of target values (as in the snippet). Either binary (alarm/not_alarm), or continuous (e.g. max temperature deviation). In the latter case you'd need to change sigmoid activation for something else.

Should i normalise the data beforehand

Yes, it's essential to preprocess your raw data. I see 2 crucial things to do here:

  1. Normalise temperature values with min-max or standardization (wiki, sklearn preprocessing). Plus, I'd add a bit of smoothing.
  2. Drop some fraction of last timestamps from all of the time-series to avoid information leak.

Finally, I'd say that this task is more complex than it seems to be. You might want to either find a good starter tutorial on time-series classification, or a course on machine learning in general. I believe you can find a better method than RNN.

Upvotes: 2

Related Questions