Ivan
Ivan

Reputation: 76

Difference between two predict functions

experts!

I was testing a logistic regression model on a training dataset. I knew that the "Predict" function can tell me the probability (type="response") of a unique event happening( In this case, an employee left the company).

I was also aware that a new package called "Tidypredict" released in January 2019, which also predicts the probability of an event happening at 95% interval.

When I tried these two different methods, it shows different probabilities for the same employee.

I researched the topic. It seems that the best timing to use "Predict" function is when the final result is already known. Because we can compare and find out how accurate the model would be.

"Tidypredict" function is used when the outcome is unknown. Could anyone please tell me what the difference is? Here is the information readily available: https://cran.r-project.org/web/packages/tidypredict/tidypredict.pdf

Predict:https://stat.ethz.ch/R-manual/R-devel/library/stats/html/predict.glm.html

Here is the results for anyone interested: 
test model:         
1         2         3         4         5         6 
0.6633092 0.2440294 0.2031897 0.9038319 0.8374229 0.1735053 
Tidypredict:

    Age         Los Gender Minority test.model       fit
1 xx.xx ThreeToFive   Male Minority  0.6633092 0.7116757
2 xx.xx   ZeroToOne   Male Minority  0.2440294 0.6834286
3 xx.xx   ZeroToOne Female Minority  0.2031897 0.6303713
4 xx.xx TentoTwenty   Male Minority  0.9038319 0.6963801
5 xx.xx ThreeToFive   Male Minority  0.8374229 0.8658365
6 xx.xx   ZeroToOne Female Minority  0.1735053 0.5840209



      #logistic model# 
model1=glm(Leave~.,family="binomial",data=train)
       #Predict function# 
    test.model<-predict(model1,newdata=test1,type="response")
      #Tidypredict function#
       emp_risk<-test1%>%
       tidypredict_to_column(model1)

Upvotes: 1

Views: 629

Answers (1)

Chase
Chase

Reputation: 69201

I'm not able to reproduce your problem - here's a reproducible example illustrating that the predictions from predict() match those of tidypredict_to_column(). My advice - dig into a specific example that doesn't match and figure out the difference. If you post a reproducible example, you'll get more specific help:

library(titanic)
library(dplyr)
library(tidypredict)
d <- titanic_train
mod <- glm(Survived ~ Pclass + Sex + Age + SibSp + Parch, data = d, family = "binomial")

d <- d %>% tidypredict_to_column(mod)
d$fit2 <- predict(mod, newdata = d, type = "response")
summary(d$fit - d$fit2)
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
#>       0       0       0       0       0       0     177

Created on 2019-04-01 by the reprex package (v0.2.1)

Upvotes: 1

Related Questions