Reputation: 76
experts!
I was testing a logistic regression model on a training dataset. I knew that the "Predict" function can tell me the probability (type="response") of a unique event happening( In this case, an employee left the company).
I was also aware that a new package called "Tidypredict" released in January 2019, which also predicts the probability of an event happening at 95% interval.
When I tried these two different methods, it shows different probabilities for the same employee.
I researched the topic. It seems that the best timing to use "Predict" function is when the final result is already known. Because we can compare and find out how accurate the model would be.
"Tidypredict" function is used when the outcome is unknown. Could anyone please tell me what the difference is? Here is the information readily available: https://cran.r-project.org/web/packages/tidypredict/tidypredict.pdf
Here is the results for anyone interested:
test model:
1 2 3 4 5 6
0.6633092 0.2440294 0.2031897 0.9038319 0.8374229 0.1735053
Tidypredict:
Age Los Gender Minority test.model fit
1 xx.xx ThreeToFive Male Minority 0.6633092 0.7116757
2 xx.xx ZeroToOne Male Minority 0.2440294 0.6834286
3 xx.xx ZeroToOne Female Minority 0.2031897 0.6303713
4 xx.xx TentoTwenty Male Minority 0.9038319 0.6963801
5 xx.xx ThreeToFive Male Minority 0.8374229 0.8658365
6 xx.xx ZeroToOne Female Minority 0.1735053 0.5840209
#logistic model#
model1=glm(Leave~.,family="binomial",data=train)
#Predict function#
test.model<-predict(model1,newdata=test1,type="response")
#Tidypredict function#
emp_risk<-test1%>%
tidypredict_to_column(model1)
Upvotes: 1
Views: 629
Reputation: 69201
I'm not able to reproduce your problem - here's a reproducible example illustrating that the predictions from predict()
match those of tidypredict_to_column()
. My advice - dig into a specific example that doesn't match and figure out the difference. If you post a reproducible example, you'll get more specific help:
library(titanic)
library(dplyr)
library(tidypredict)
d <- titanic_train
mod <- glm(Survived ~ Pclass + Sex + Age + SibSp + Parch, data = d, family = "binomial")
d <- d %>% tidypredict_to_column(mod)
d$fit2 <- predict(mod, newdata = d, type = "response")
summary(d$fit - d$fit2)
#> Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
#> 0 0 0 0 0 0 177
Created on 2019-04-01 by the reprex package (v0.2.1)
Upvotes: 1