Shima
Shima

Reputation: 147

loop through variables in R

I have a data with an outcome,Y and 10 predictors (X1-X10).

set.seed(1001)
n <- 100
Y < c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)
X1 <- sample(x=c(0,1,2), size=n, replace=TRUE, prob=c(0.1,0.4,0.5))
X2 <- sample(x=c(0,1,2), size=n, replace=TRUE, prob=c(0.5,0.25,0.25))
X3 <- sample(x=c(0,1,2), size=n, replace=TRUE, prob=c(0.3,0.4,0.4))
X4 <- sample(x=c(0,1,2), size=n, replace=TRUE, prob=c(0.35,0.35,0.3))
X5 <- sample(x=c(0,1,2), size=n, replace=TRUE, prob=c(0.1,0.2,0.7))
X6 <- sample(x=c(0,1,2), size=n, replace=TRUE, prob=c(0.8,0.1,0.1))
X7 <- sample(x=c(0,1,2), size=n, replace=TRUE, prob=c(0.1,0.1,0.8))
X8 <- sample(x=c(0,1,2), size=n, replace=TRUE, prob=c(0.35,0.35,0.3))
X9 <- sample(x=c(0,1,2), size=n, replace=TRUE, prob=c(0.35,0.35,0.3))
X10 <- c(0,2,2,2,2,2,2,2,0,2,0,2,2,0,0,0,0,0,2,0,0,2,2,0,0,2,2,2,0,2,0,2,0,2,1,2,1,1,1,1,1,1,1,1,1,1,1,0,1,2,2,2,2,2,2,2,2,2,2,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,1,0,0,0,0)

datasim <- data.frame(Y,X1,X2,X3,X4,X5,X6,X7,X8,X9,X10)

My aim is to fit a logistic model to each of the predictors and calculate the deviance difference (dDeviance). And later on bootstrapping the dDeviance for 1000 times (R=1000).I tried the following function which works for one variable at a time. Can you suggest how I can enhance the codes so that it will loop through variable 1 to 10, calculate the dDeviance and later bootstrapping the values.

glmfunction <- function(data,indices)
{
glm.snp1 <- glm(Y~X1, family="binomial", data=data[indices,])
null <- glm.snp1$null.deviance
residual <- glm.snp1$deviance
dDeviance <-(null-residual)
return(dDeviance)
}

result <- boot(datasim,glmfunction, R=1000)

Upvotes: 0

Views: 562

Answers (1)

Heroka
Heroka

Reputation: 13149

There are probably a lot of approaches to solve this, but here's how I would do it. I first create a vector of independent variables I want to use in my models:

#vector of independent variables
iv <- grep("X",colnames(datasim), value=T)

Then I loop over them to fit the model and extract the dDeviance. This ensures that my boot-function doesn't return one value, but a vector of length(number of independent variables).

glmfunction <- function(data,indices, iv){
  res <- sapply(iv, function(x){
    fit <- glm(formula=sprintf("Y~%s",x), family="binomial", data=data[indices,])
    #deviance
    dDeviance <- with(fit, null.deviance - deviance)
    return(dDeviance)
  })
  res
}

I have chosen to make iv a formal argument of the boot-function so you have to specify it and don't run in to unexpected scoping-issues, for flexibility and easier debugging. You can then run your bootstrap:

result <- boot(datasim,glmfunction, iv = iv, R=10)

Upvotes: 3

Related Questions