JWick
JWick

Reputation: 15

Created a linear model based on existing data, I need to normalize data I want to predict

So I have created a linear model of times against the number of people booked into those times. For both pieces of data, I normalized the data into the 0-1 range as is widely used and plotted using lm() with bookings being what I want to predict using the times.

But now I want to predict what bookings might be for later times of day. I need to normalize them too but I'm not sure which way. Do I normalize them on their own or should I include them in the original time data I normalized at the start before predicting. I think both will give back different normalized values which would affect my prediction.

So basically, which way should the new times be normalized, on their own or as part of the original time data?

Upvotes: 0

Views: 401

Answers (1)

Ben Bolker
Ben Bolker

Reputation: 226557

Normalize the data according to the min and max of the original data. In order to do this, you must have retained the min and max values of the original data; if you transformed the original variables to a [0,1] scale in place, discarding the original data, you're stuck.

To incorporate your comment: if your original predictor was x0 and you used

(x0-min(x0))/(max(x0)-min(x0)) 

to transform the data for analysis, you would use

(x1-min(x0))/(max(x0)-min(x0)) 

to transform your new variable x1 for prediction (assuming you didn't replace x0 with its scaled version!)

The built-in scale() function attaches attributes of the original data to the transformed data, which are helpful in similarly transforming other data sets (or back-transforming). (Confusingly, the function labels the value that was subtracted from the original value center; scale is the value by which the shifted value was divided. In your case, center is min(x), while scale is max(x)-min(x) = diff(range(x)).)

dd <- data.frame(x=1:10)
scalefun <- function(x) drop(scale(x,center=min(x),scale=diff(range(x))))
dd <- transform(dd,x=scalefun(x))

Function to back-transform

unscalefun <- function(x,orig=x) {
   c(x*attr(orig,"scaled:scale") + attr(orig,"scaled:center"))
}

Function to transform according to another data set:

rescalefun <- function(x,orig=x) {
   scale(x,scale=attr(orig,"scaled:scale"),center=attr(orig,"scaled:center"))
}
rescalefun(1:20)

Upvotes: 1

Related Questions