Ashikur Rahman
Ashikur Rahman

Reputation: 19

How to predict using the Generalized Linear Modelling In R with given data point

I am trying to predict for the year 2018 where a dataset is trained using the Poisson GLM

I have the below data

        Year        Gender    Total_Apprentices
    1   2012        Female            278290
    2   2012          Male            230330
    3   2013        Female            231645
    4   2013          Male            205521
    5   2014        Female            264554
    6   2014          Male            233830
    7   2015        Female            268593
    8   2015          Male            239739
    9   2016        Female            264350
    10  2016          Male            230532
    11  2017        Female            184237
    12  2017          Male            191524

This is the code I have written

    library("xlsx")
    library("tidyverse")

    setwd("folder location") 
    getwd()
    # Loading

    # xlsx files using xlsx library

    f_path <- "filename.xlsx"

    my_data <- read.xlsx(f_path, 1, header=TRUE)
    plot(my_data)

    model1 <- glm(my_data$Total ~ my_data$Year+my_data$Gender,my_data, family= poisson)


    summary(model1)

    pois.pred <- predict(model1, type="response")

    my_data
    pois.pred

How would I go about predicting for the year 2018

I have tried the below code but doesn't work

    n_data=data.frame(Year=2018,Gender="Male")
    predict(model1, newdata=n_data, type="response")

I get the exact same output as this code

 pois.pred <- predict(model1, type="response")

which is basically predicting my observed value from the year 2012 to 2017 and there is a message

Warning message: 'newdata' had 1 row but variables found have 12 rows

Upvotes: 1

Views: 51

Answers (1)

DS_UNI
DS_UNI

Reputation: 2650

The problem is with the glm call and not the predict call, If you pass the data in the formula, then you will not be able to give the model new data to predict, because the variables will be called my_data$Year etc. in the model object, not Year and Gender.

If you change the call to :

glm(Total_Apprentices ~ Year+Gender, 
    data = my_data, family= poisson)

Then the prediction on new data will work

Upvotes: 1

Related Questions