Reputation: 43
I am doing a research about time series temperature prediction using Artificial Neural Networks
and most of the references normalize the input values before feeding them into the neural network using the Min-Max Normalization technique
. Both the training and testing data sets were normalized. The input values are a specific day's temperature, dew point, precipitation, pressure, and wind speed values.
In the case where I only have one sample in the test set (for example, I only have today's weather attributes to predict tomorrow's temperature), how can I normalize the values since I would have the same minimum and maximum values for each attribute?
P.S. I already e-mailed the authors of the studies I used and none of them have sent a reply so I figured I would ask for help here :)
Upvotes: 4
Views: 2612
Reputation: 66795
Normalization is performed on training and testing set in the same way, so you compute the "bounds" on the training set, and only apply it to the test set (you should not use test data to calculate those bounds, as you should assume that in the moment of model creation you do not know the test data).
You seem to be missing the core idea of machine learning here. You cannot train a predictive model on one sample. By number of samples we mean the size of the set of observations you collected, not the amount of data feeded to the model (so when you predict tommorows temperature based on todays it does not mean that you have one sample, you need to have lots of samples from the history in order to train any model, and neural net in particular).
So the question of normalization is not really important here - as you simply do it for the whole history set or you can normalize them by hand if you know the exact boundaries of values that each attribute can achieve (for example - you are measuring temperature in Celcius degrees, so it should fall into [-20,40] interval or the samller one if you live in the "softer" part of the world).
Upvotes: 2
Reputation: 14164
Normalize the sample, as though it were in the training or testing datasets. They're the ranges you trained for, right?
Generally, putting limited/partial data into the context of what you can process/ have trained to process will be the only way to get a meaningful or validated output from it.
And of course, you shouldn't be totally limited to just a single sample.. since you should be entirely able to keep (and use) a history of preceding days' samples.
Upvotes: 1