Reputation: 21
I have created a logistic regression model in r to try to predict the outcome of cricket matches. However, my model produces probability values greater than 1. The output is 1.031704 Any tips on how I could improve my model to get an accurate estimation of probability?
set.seed(1)
#Use 70% of dataset as training set and remaining 30% as testing set
sample <- sample(c(TRUE, FALSE), nrow(ODIMT), replace=TRUE, prob=c(0.7,0.3))
train <- ODIMT[sample, ]
test <- ODIMT[!sample, ]
model <- glm(Result~Target+Opposition+Country, family="binomial", data=ODIMT)
options(scipen=999)
summary(model)
pscl::pR2(model)["McFadden"]
caret::varImp(model)
car::vif(model)
new <- data.frame(Target = 226,Opposition = "v India", Country = "England")
predict(model, new, type="response")
Result variable is 1 or 0, Target is 0-400, and the other two are character variables.
data:
Country Target Result Opposition Ground
England NA 1 v India Kolkata
Australia 251 0 v Pakistan Kolkata
South Africa 168 0 v India Delhi
Bangladesh NA 1 v Pakistan Delhi
England 306 0 v Australia Melbourne
New Zealand NA 1 v Sri Lanka Melbourne
Output of summary:
Upvotes: 1
Views: 1141
Reputation: 2113
I think you are predicting the log-odds value. From the docs:
the type of prediction required. The default is on the scale of the linear predictors; the alternative "response" is on the scale of the response variable. Thus for a default binomial model the default predictions are of log-odds (probabilities on logit scale) and type = "response" gives the predicted probabilities. The "terms" option returns a matrix giving the fitted values of each term in the model formula on the linear predictor scale.
As noted in the comments, if you use type="response" you get the predicted probabilities.
Have a look at this question for more info.
Upvotes: 1