Georgie Baul
Georgie Baul

Reputation: 55

predict() should display one value, but generates way too much values

I have a dataset of the german soccer league, which shows every team of the league, the player value, goals and points. Freiburg soccer team has scored 19 goals with a value of 1.12. Now I want to predict out of the created linear model, how many goals the team of Freiburg could expect with a player value of 5. If I run the stated code line, the function shows me not one value, but 18 for each team. How can I change that, that I just get the value for the team of Freiburg? (Which should be the prediction 27.52 using the linear model.)

m3 <- lm(bundesliga$Goals ~ bundesliga$PlayerValue)
summary(m3)
nd <- data.frame(PlayerValue = 5) 
predict(m3, newdata = nd)

Dataset: enter image description here

Upvotes: 1

Views: 380

Answers (1)

Edward
Edward

Reputation: 18513

You have specified your model in a way that R discourages.

The preferred way is:

m3 <- lm(Goals ~ PlayerValue, data=bundesliga)

Then the prediction works as expected using your command:

nd <- data.frame(PlayerValue = 5) 
predict(m3, newdata = nd)
#       1 
#27.52412 

Although the help page of lm does say that the data argument is optional, specifying it in the model allows other functions, such as predict, to work. There is a note in the help page of predict.lm:

Note Variables are first looked for in newdata and then searched for in the usual way (which will include the environment of the formula used in the fit). A warning will be given if the variables found are not of the same length as those in newdata if it was supplied.

This is why your original command doesn't work and you get the warning message:

predict(m3, newdata = nd)
       1        2        3        4        5        6        7        8        9 
40.06574 28.31378 26.08416 25.45708 25.31773 25.22483 24.22614 23.55261 23.36681 
      10       11       12       13       14       15       16       17       18 
21.60169 20.51011 20.23140 20.25463 19.58110 19.48820 18.60564 18.60564 18.51274
#Warning message:
#'newdata' had 1 row but variables found have 18 rows

The environment of your formula is not the bundesliga data frame, so R cannot find PlayerValue.


Data:

bundesliga <- structure(list(PlayerValue = c(10.4, 5.34, 4.38, 4.11, 4.05, 4.01, 
3.58, 3.29, 3.21, 2.45, 1.98, 1.86, 1.87, 1.58, 1.54, 1.16, 1.16, 1.12), 
Goals = c(34, 32, 34, 35, 32, 16, 26, 27, 23, 13, 10, 21, 22, 18, 24, 21, 12, 19)), 
class = "data.frame", row.names = c(NA, -18L))

Upvotes: 2

Related Questions