Guilherme D. Garcia
Guilherme D. Garcia

Reputation: 179

Logistic regression with categorical predictors using JAGS

I'm new to JAGS and I'm trying to predict a binary outcome (0/1) using 9 non-continuous predictors. Predictor values may be 0, 1 or 2. This is my first time doing this, and even though I can get the model to run, I am 100% sure there's definitely a number of issues here.

Data file sample (list)

$y
[1] 0 0 0 1 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0
[29] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0

$N
[1] 50

$oAnt
[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 0 1 1 1 1 1 1 1 1 1
[29] 1 1 1 1 1 1 2 1 1 0 1 1 1 1 1 2 1 1 1 1 1 1

$nAnt
[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[29] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

$cAnt
[1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0
[29] 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0

$oPen
[1] 1 1 1 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 1 0 0 1 1 1 1 1 1 0
[29] 1 0 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 0 2 1 1

$nPen
[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 2 1 1 1 1 1 1 1 1 1 1 2 1
[29] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1

$cPen
[1] 0 1 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 1 1 0 0 0 0 1 0 0
[29] 0 0 0 1 1 0 0 0 0 0 0 0 0 1 0 1 0 0 1 0 0 0

$oFin
[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1
[29] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

$nFin
[1] 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 2 1 1 1 1 1 1 2 1 1 1
[29] 1 1 1 1 2 1 1 1 1 2 1 1 1 2 1 1 1 1 1 1 1 3

$cFin
[1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1
[29] 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0

Model

model {
    for( i in 1 : N ){
        y[i] ~ dbern(mu[i])
        mu[i] <- 1/(1+exp(-(b0 + b1*oAnt[i] + b2*nAnt[i] + b3*cAnt[i] + b4*oPen[i] + b5*nPen[i] + b6*cPen[i] + b7*oFin[i] + b8*nFin[i] + b9*cFin[i])))
}
b0 ~ dnorm(0, 1.0e-12)
b1 ~ dnorm(0, 1.0e-12)
b2 ~ dnorm(0, 1.0e-12)
b3 ~ dnorm(0, 1.0e-12)
b4 ~ dnorm(0, 1.0e-12)
b5 ~ dnorm(0, 1.0e-12)
b6 ~ dnorm(0, 1.0e-12)
b7 ~ dnorm(0, 1.0e-12)
b8 ~ dnorm(0, 1.0e-12)
b9 ~ dnorm(0, 1.0e-12)
}

I used estimates from a glm() model as inits (as suggest by A. Gelman)—but for the sake of simplicity, let's just assume I'll let JAGS pick the initial values for the chains.

Running model etc.

jagsModel = jags.model(file = "antPenFin.txt", data = dataList, n.chains = 2, n.adapt = 500)

update(jagsModel, n.iter = 500)

codaSamples = coda.samples(jagsModel,
variable.names = names(dataList)[3:11], n.iter = 5000)

Problems

The output of my model looks totally off (which becomes clear when I try to plot it). I'm sure there's some very basic issue here. Could somebody help out?

Thanks a lot.

Upvotes: 1

Views: 3016

Answers (1)

mfidino
mfidino

Reputation: 3055

Your issue here is that you cannot supply a range of numbers (from 0 to 3 in your case) as categorical covariates. Currently, your model is interpreting those numbers as continuous. What you need to do is convert these to dummy variables. You can do this easily using the model.matrix function in R. I'm going to generate some data as an example, which you could then apply to your data.

# generate y
y <- sample(0:1, 30, replace = TRUE)
# add three different categorical covariates. Note here that these are all
# factors.
oAnt <- factor(sample(0:2, 30, replace = TRUE))
cAnt <- factor(sample(0:1, 30, replace = TRUE))
nFin <- factor(sample(0:3, 30, replace = TRUE))

# create your model matrix
my_matrix <- model.matrix(y ~ oAnt + cAnt + nFin)

head(my_matrix)

   (Intercept) oAnt1 oAnt2 cAnt1 nFin1 nFin2 nFin3
1           1     1     0     1     0     1     0
2           1     1     0     1     1     0     0
3           1     1     0     0     0     0     1
4           1     1     0     1     0     0     1
5           1     0     0     1     0     1     0
6           1     1     0     1     0     0     1

For each level in a factor for a predictor, you are going to need to create n-1 columns (e.g. nFin ranges from 0-3, you will create 3 columns of dummy variables). Thus, you are going to have many more regresison coefficients to include in your model. Once you make your model matrix you can convert it to a list as such.

# remove the intercept from the list as you already have it in your model
matrix_list <- as.list(data.frame(my_matrix[,-1]))

From there, all you need to do is create some more predictors for model. Also, if nAnt in your model is truly all ones, go ahead and remove that as well as you are basically just including two intercepts.

Upvotes: 0

Related Questions