Reputation: 19
I am trying to predict for the year 2018 where a dataset is trained using the Poisson GLM
I have the below data
Year Gender Total_Apprentices
1 2012 Female 278290
2 2012 Male 230330
3 2013 Female 231645
4 2013 Male 205521
5 2014 Female 264554
6 2014 Male 233830
7 2015 Female 268593
8 2015 Male 239739
9 2016 Female 264350
10 2016 Male 230532
11 2017 Female 184237
12 2017 Male 191524
This is the code I have written
library("xlsx")
library("tidyverse")
setwd("folder location")
getwd()
# Loading
# xlsx files using xlsx library
f_path <- "filename.xlsx"
my_data <- read.xlsx(f_path, 1, header=TRUE)
plot(my_data)
model1 <- glm(my_data$Total ~ my_data$Year+my_data$Gender,my_data, family= poisson)
summary(model1)
pois.pred <- predict(model1, type="response")
my_data
pois.pred
How would I go about predicting for the year 2018
I have tried the below code but doesn't work
n_data=data.frame(Year=2018,Gender="Male")
predict(model1, newdata=n_data, type="response")
I get the exact same output as this code
pois.pred <- predict(model1, type="response")
which is basically predicting my observed value from the year 2012 to 2017 and there is a message
Warning message: 'newdata' had 1 row but variables found have 12 rows
Upvotes: 1
Views: 51
Reputation: 2650
The problem is with the glm
call and not the predict
call, If you pass the data in the formula, then you will not be able to give the model new data to predict, because the variables will be called my_data$Year etc. in
the model object, not Year and Gender.
If you change the call to :
glm(Total_Apprentices ~ Year+Gender,
data = my_data, family= poisson)
Then the prediction on new data will work
Upvotes: 1