Plotting Logistic Equation Fit or many variables in ggplot2

Question

I have a logistic model fit, say myfit that I've saved. The data frame I'm using is in the format of (where the first column is the outcome).

  medical10 age female nonwhite   bmi smoked condxs insuredd smi2d
1         0  60      0        1 29.97      0      0        1     0
2         0  42      0        1 25.85      1      3        1     1
3         0  62      1        0 25.06      0      1        1     0
4         0  62      0        0 36.27      0      2        0     0
5         0  32      0        0 33.36      0      0        1     0
6         0  41      0        0 21.70      1      0        0     0
...

What I would like to do is to make a logistic plot (in this form: http://ww2.coastal.edu/kingw/statistics/R-tutorials/logistic.html) for each combination of variables.

Since there are 8 variables, there are 2^8 permutations of having one variable on the x-axis while holding the other 7 constant. Is there a way I can automate the plot using ggplot2?

For instance, if 'x' was age, I would get the mean of bmi, and then pick 0 for female, 0 for nonwhite, 0 for smoked, 0 for condxs, 0 for insuredd and 0 for smi2d. I would then do a prediction and make a ggplot of x vs y.

However, this is quite tedious and I was hoping there was a better way?

MrFlick · Accepted Answer

I don't know of anything particular in ggplot that will make this easy. But I did find a way (though it was more work than I was expecting. Perhaps others can improve. Anyway, first let's define a more useful set of sample data

N<-100
set.seed(15)
invlogit <- function(x) exp(x)/(exp(x)+1)
dd <- transform(data.frame(
    age=runif(N,30,60),
    female=sample(0:1, N, replace=T),
    white=sample(c("Y","N"), N, replace=T),
    bmi=rnorm(N,30,2)),
    medical=as.numeric(invlogit((-60+2*age-1.5*bmi+3*female)/10)>runif(N)))

fit<-glm(medical~. ,dd, family=binomial)

So now we have some data and a model. Now i'll define a helper function that will predict values for a single variable while holding the others at the mean value.

predictone<-function(fit, var, xlim=NULL, fix=list(), n=101, 
    xname=var, type="response") {

    tt <- terms(fit)
    vv <- as.list(attr(tt, "variables"))[-c(1,attr(tt, "response")+1)]
    vn <- sapply(vv, deparse)
    stopifnot(var %in% vn)
    others <- vn[vn != var]
    def<-lapply(others, function(x) {
        if(x %in% names(fix)) {
            if(is.factor(val)) {
                stopifnot(fix[[x]] %in% levels(val))
                val[val==fix[[x]]][1]
            } else {
                fix[[x]]    
            }
        } else {
            val <- fit$data[[x]]
            if(is.factor(val)) {
                val[val==names(sort(table(val))[1])][1]
            } else {
                mean(val)
            }
        }
    })

    if(is.factor(fit$data[[var]])) {
        newdata <- data.frame(def, unique(fit$data[[var]]))     
    } else {
        if(is.null(xlim)) {
            xlim <- range(fit$data[[var]])
        }
        newdata <- data.frame(def, seq(min(xlim), max(xlim), length.out=n)) 
    }
    names(newdata)<-c(others, var)
    pp<-data.frame(newdata[[var]], predict(fit,newdata, type=type))
    names(pp)<-c(xname, type)
    attr(pp,"fixed")<-setNames(def, others)
    pp
}

Basically this function exists to calculate the averages of all the other variables and then do the actual prediction. We can use it with the test data to make a bunch of plots with

plots<-lapply(names(dd)[1:4], function(x) {
    if(is.factor(dd[[x]])) {
        ggplot(predictone(fit, x), aes_string(x=x, y="response")) + geom_point()
    } else {
        ggplot(predictone(fit, x), aes_string(x=x, y="response")) + geom_line() 
    }
})

require(gridExtra)
do.call(grid.arrange,  plots)

which will return

enter image description here

Note that factors are treated differently than regular numeric values. When you code categorical variables with 0/1 R can't tell they are categorical so it doesn't do a good job of inferring the values which make sense. I would encourage you to convert 0/1 values to a proper factor variable.

Plotting Logistic Equation Fit or many variables in ggplot2

Answers (2)

Related Questions