jbest
jbest

Reputation: 640

Used Predict function on New Dataset with different Columns

Using "stackloss" data in R, I created a regression model as seen below:

    stackloss.lm = lm(stack.loss ~  Air.Flow + Water.Temp + Acid.Conc.,data=stackloss)

stackloss.lm 
newdata = data.frame(Air.Flow=stackloss$Air.Flow, Water.Temp= stackloss$Water.Temp, Acid.Conc.=stackloss$Acid.Conc.)

Suppose I get a new data set and would need predict its "stack.loss" based on the previous model as seen below:

#suppose I need to used my model on a new set of data
stackloss$predict1[-1] <- predict(stackloss.lm, newdata)

I get this error:

Error in `$<-.data.frame`(`*tmp*`, "predict1", value = numeric(0)) : 
  replacement has 0 rows, data has 21

Is their a way to used the predict function on different data set with the same columns but different rows?

Thanks in advance.

Upvotes: 0

Views: 4868

Answers (1)

MrFlick
MrFlick

Reputation: 206187

You can predict into a new data set of whatever length you want, you just need to make sure you assign the results to an existing vector of appropriate size.

This line causes a problem because

stackloss$predict1[-1] <- predict(stackloss.lm, newdata)

because you can't assign and subset a non-existing vector at the same time. This also doesn't work

dd <- data.frame(a=1:3)
dd$b[-1]<-1:2

The length of stackloss which you used to fit the model will always be the same length so re-assigning new values to that data.frame doesn't make sense. If you want to use a smaller dataset to predict on, that's fine

stackloss.lm = lm(stack.loss ~  Air.Flow + Water.Temp + Acid.Conc.,data=stackloss)

newdata = head(data.frame(Air.Flow=stackloss$Air.Flow, Water.Temp= stackloss$Water.Temp, Acid.Conc.=stackloss$Acid.Conc.),5)

predict(stackloss.lm, newdata)
       1        2        3        4        5 
38.76536 38.91749 32.44447 22.30223 19.71165 

Since the result has the same number of values as newdata has rows (n=5), it makes sense to attach these to newdata. It would not make sense to attach to stackloss because that has a different number of rows (n=21)

newdata$predcit1 <- predict(stackloss.lm, newdata)

Upvotes: 1

Related Questions