Reputation: 11686
I have a logistic model fit, say myfit
that I've saved. The data frame I'm using is in the format of (where the first column is the outcome).
medical10 age female nonwhite bmi smoked condxs insuredd smi2d
1 0 60 0 1 29.97 0 0 1 0
2 0 42 0 1 25.85 1 3 1 1
3 0 62 1 0 25.06 0 1 1 0
4 0 62 0 0 36.27 0 2 0 0
5 0 32 0 0 33.36 0 0 1 0
6 0 41 0 0 21.70 1 0 0 0
...
What I would like to do is to make a logistic plot (in this form: http://ww2.coastal.edu/kingw/statistics/R-tutorials/logistic.html) for each combination of variables.
Since there are 8 variables, there are 2^8 permutations of having one variable on the x-axis while holding the other 7 constant. Is there a way I can automate the plot using ggplot2?
For instance, if 'x' was age, I would get the mean of bmi, and then pick 0 for female, 0 for nonwhite, 0 for smoked, 0 for condxs, 0 for insuredd and 0 for smi2d. I would then do a prediction and make a ggplot of x vs y.
However, this is quite tedious and I was hoping there was a better way?
Upvotes: 0
Views: 561
Reputation: 2230
An update to the R rms
package to be posted on CRAN on about 2015-01-01 includes a new function ggplot.Predict
(called by ggplot()
) that provides a general way to generate such curves using ggplot2
, handling multiple moving variables, interactions, etc. You can see some example usage at https://github.com/harrelfe/rms/blob/master/man/ggplot.Predict.Rd . You can do all this with the current version of rms
using lattice
graphics and the plot.Predict
function.
Upvotes: 3
Reputation: 206187
I don't know of anything particular in ggplot that will make this easy. But I did find a way (though it was more work than I was expecting. Perhaps others can improve. Anyway, first let's define a more useful set of sample data
N<-100
set.seed(15)
invlogit <- function(x) exp(x)/(exp(x)+1)
dd <- transform(data.frame(
age=runif(N,30,60),
female=sample(0:1, N, replace=T),
white=sample(c("Y","N"), N, replace=T),
bmi=rnorm(N,30,2)),
medical=as.numeric(invlogit((-60+2*age-1.5*bmi+3*female)/10)>runif(N)))
fit<-glm(medical~. ,dd, family=binomial)
So now we have some data and a model. Now i'll define a helper function that will predict values for a single variable while holding the others at the mean value.
predictone<-function(fit, var, xlim=NULL, fix=list(), n=101,
xname=var, type="response") {
tt <- terms(fit)
vv <- as.list(attr(tt, "variables"))[-c(1,attr(tt, "response")+1)]
vn <- sapply(vv, deparse)
stopifnot(var %in% vn)
others <- vn[vn != var]
def<-lapply(others, function(x) {
if(x %in% names(fix)) {
if(is.factor(val)) {
stopifnot(fix[[x]] %in% levels(val))
val[val==fix[[x]]][1]
} else {
fix[[x]]
}
} else {
val <- fit$data[[x]]
if(is.factor(val)) {
val[val==names(sort(table(val))[1])][1]
} else {
mean(val)
}
}
})
if(is.factor(fit$data[[var]])) {
newdata <- data.frame(def, unique(fit$data[[var]]))
} else {
if(is.null(xlim)) {
xlim <- range(fit$data[[var]])
}
newdata <- data.frame(def, seq(min(xlim), max(xlim), length.out=n))
}
names(newdata)<-c(others, var)
pp<-data.frame(newdata[[var]], predict(fit,newdata, type=type))
names(pp)<-c(xname, type)
attr(pp,"fixed")<-setNames(def, others)
pp
}
Basically this function exists to calculate the averages of all the other variables and then do the actual prediction. We can use it with the test data to make a bunch of plots with
plots<-lapply(names(dd)[1:4], function(x) {
if(is.factor(dd[[x]])) {
ggplot(predictone(fit, x), aes_string(x=x, y="response")) + geom_point()
} else {
ggplot(predictone(fit, x), aes_string(x=x, y="response")) + geom_line()
}
})
require(gridExtra)
do.call(grid.arrange, plots)
which will return
Note that factors are treated differently than regular numeric values. When you code categorical variables with 0/1 R can't tell they are categorical so it doesn't do a good job of inferring the values which make sense. I would encourage you to convert 0/1 values to a proper factor variable.
Upvotes: 3