B. Davis
B. Davis

Reputation: 3441

Manually conduct leave-one-out cross validation for a GLMM using a for() loop in R

I am trying to build a for() loop to manually conduct leave-one-out cross validations for a GLMM fit using the lmer() function from the lme4 pkg. I need to remove an individual, fit the model and use the beta coefficients to predict a response for the individual that was withheld, and repeat the process for all individuals.

I have created some test data to tackle the first step of simply leaving an individual out, fitting the model and repeating for all individuals in a for() loop.

The data have a binary (0,1) Response, an IndID that classifies 4 individuals, a Time variable, and a Binary variable. There are N=100 observations. The IndID is fit as a random effect.

require(lme4)

#Make data
Response <- round(runif(100, 0, 1))
IndID <- as.character(rep(c("AAA", "BBB", "CCC", "DDD"),25))
Time <- round(runif(100, 2,50))
Binary <- round(runif(100, 0, 1))

#Make data.frame
Data <- data.frame(Response, IndID, Time, Binary)
Data <- Data[with(Data, order(IndID)), ]   #**Edit**: Added code to sort by IndID

#Look at head()
head(Data)
 Response IndID Time Binary
1        0   AAA   31      1
2        1   BBB   34      1
3        1   CCC    6      1
4        0   DDD   48      1
5        1   AAA   36      1
6        0   BBB   46      1

#Build model with all IndID's
fit <- lmer(Response ~ Time + Binary + (1|IndID ), data = Data, 
        family=binomial)
summary(fit)

As stated above, my hope is to get four model fits – one with each IndID left out in a for() loop. This is a new type of application of the for() command for me and I quickly reached my coding abilities. My attempt is below.

fit <- list()
for (i in Data$IndID){
    fit[[i]] <- lmer(Response ~ Time + Binary + (1|IndID), data = Data[-i], 
        family=binomial)
    }

I am not sure storing the model fits as a list is the best option, but I had seen it on a few other help pages. The above attempt results in the error:

Error in -i : invalid argument to unary operator

If I remove the [-i] conditional to the data=Data argument the code runs four fits, but data for each individual is not removed.

Just as an FYI, I will need to further expand the loop to: 1) extract the beta coefs, 2) apply them to the X matrix of the individual that was withheld and lastly, 3) compare the predicted values (after a logit transformation) to the observed values. As all steps are needed for each IndID, I hope to build them into the loop. I am providing the extra details in case my planned future steps inform the more intimidate question of leave-one-out model fits.

Thanks as always!

Upvotes: 1

Views: 2951

Answers (1)

Christopher Louden
Christopher Louden

Reputation: 7592

The problem you are having is because Data[-i] is expecting i to be an integer index. Instead, i is either AAA, BBB, CCC or DDD. To fix the loop, set

data = Data[Data$IndID != i, ]

in you model fit.

Upvotes: 2

Related Questions