Reputation: 12178
I am running a Bayesian logit with MCMCpack::MCMClogit
. The syntax is easy and follows lm()
or glm()
, but I can't find any equivalent of the predict.glm
function. Is there any way of predicting the probabilities of the outcomes in MCMClogit
for each unit of observation in the dataframe? predict()
is especially useful for validating training data from new data, which is what I ultimately have to do.
df = read.csv("http://dl.dropbox.com/u/1791181/MCMC.csv")#Read in data
model.glm = glm(SECONDARY.LEVEL ~ AGE + SEX, data=df, family=binomial(link=logit))
glm.predict = predict(model.glm, type="response")
For MCMClogit():
model.mcmc = MCMClogit(SECONDARY.LEVEL ~ AGE + SEX, data=df, mcmc=1000)
Upvotes: 4
Views: 3299
Reputation: 23976
You could use the posterior distribution of model parameters produced by MCMC to get a distribution of predictions, using the logistic function.
For instance, if your model formula is y ~ x1 + x2 + x3
, and your MCMC output is stored in the variable posterior.mcmc
, then you could use
function(x1, x2, x3) 1 / (1 + exp(-posterior.mcmc %*% rbind(1, x1, x2, x3)))
to give the distribution analogous to predict.glm(., 'response')
More detailed example for the case of a single input variable:
library(extraDistr)
library(MCMCpack)
# Take x uniformly distributed between -100 and 100
x <- runif(2000, min=-100, max=100)
# Generate a response which is logistic with some noise
beta <- 1/8
eps <- rnorm(length(x), 0, 1)
p <- function(x, eps) 1 / (1 + exp(-beta*x + eps))
p.x <- p(x, eps)
y <- sapply(p.x, function(p) rbern(1, p))
df1 <- data.frame(x, y)
# Fit by logistic regression
glm.logistic <- glm(y ~ x, df1, family=binomial)
# MCMC gives a distribution of values for the model parameters
posterior.mcmc <- MCMClogit(y ~ x, df1, verbose=2000)
densplot(posterior.mcmc)
# Thus, we have a distribution of model predictions for each x
predict.p.mcmc <- function(x) 1 / (1 + exp(-posterior.mcmc %*% rbind(1,x)))
interval.p.mcmc <- function(x, low, high) apply(predict.p.mcmc(x), 2,
function(x) quantile(x, c(low, high)))
predict.y.mcmc <- function(x) posterior.mcmc %*% rbind(1,x)
interval.y.mcmc <- function(x, low, high) apply(predict.y.mcmc(x), 2,
function(x) quantile(x, c(low, high)))
## Plot the data and fits ##
plot(x, p.x, ylab = 'Pr(y=1)', pch = 20, cex = 0.5, main = 'Probability vs x')
# x-values for prediction
x_test <- seq(-100, 100, 0.01)
# Blue line is the logistic function we used to generate the data, with noise removed
p_of_x_test <- p(x_test, 0)
lines(x_test, p_of_x_test, col = 'blue')
# Green line is the prediction from logistic regression
lines(x_test, predict(glm.logistic, data.frame(x = x_test), 'response'), col = 'green')
# Red lines indicates the range of model predictions from MCMC
# (for each x, 95% of the distribution of model predictions lies between these bounds)
interval.p.mcmc_95 <- interval.p.mcmc(x_test, 0.025, 0.975)
lines(x_test, interval.p.mcmc_95[1,], col = 'red')
lines(x_test, interval.p.mcmc_95[2,], col = 'red')
# Similarly for the log-odds
plot(x, log(p.x/(1 - p.x)), ylab = 'log[Pr(y=1) / (1 - Pr(y=1))]',
pch = 20, cex = 0.5, main = 'Log-Odds vs x')
lines(x_test, log(p_of_x_test/(1 - p_of_x_test)), col = 'blue')
lines(x_test, predict(glm.logistic, data.frame(x = x_test)), col = 'green')
interval.y.mcmc_95 <- interval.y.mcmc(x_test, 0.025, 0.975)
lines(x_test, interval.y.mcmc_95[1,], col = 'red')
lines(x_test, interval.y.mcmc_95[2,], col = 'red')
Upvotes: 2
Reputation: 12411
The description of the function says :
This function generates a sample from the posterior distribution of a logistic regression model using a random walk Metropolis algorithm.
I think therefore that your model.mcmc
already contains the points that MCMClogit()
has simulated.
You can use str
to see what it contains and summary
and plot
functions on it like in the example there : http://cran.r-project.org/web/packages/MCMCpack/MCMCpack.pdf
Upvotes: 0