Tabbi
Tabbi

Reputation: 69

How to make loop for one-at-a time logistic regression in R?

I want to do logistic regressions for several (n = 30) SNPs (coded as 0,1,2) as predictors and a casecontrol variable (0,1) as an outcome. As few of those rs are correlated, I cannot put all rs# in one model but have to run one at a time regression for each i.e., I cannot simply plus them together in one model like rs1 + rs2 + rs3 and so on....I need to have each regressed separately like below;

test1 = glm(casecontrol ~ rs1, data = mydata, family=binomial)

test2 = glm(casecontrol ~ rs2, data = mydata, family=binomial)

test3 = glm(casecontrol ~ rs3, data = mydata, family=binomial)

While I can run all the above regressions separately, is there a way to loop them together so I could get a summary() of all tests in one go?

I will have to adjust for age and sex too, but that would come after I run an unadjusted loop.

My data head from dput(head(mydata)) for example;

structure(list(ID = 1:6, sex = c(2L, 1L, 2L, 1L, 1L, 1L), age = c(52.4725405022036, 
58.4303618001286, 44.5300065923948, 61.4786037395243, 67.851808819687, 
39.7451378498226), bmi = c(31.4068751083687, 32.0614937413484, 
23.205021363683, 29.1445372393355, 32.6287483051419, 20.5887741968036
), casecontrol = c(0L, 1L, 0L, 1L, 1L, 1L), rs1 = c(1L, 0L, 2L, 
2L, 1L, 2L), rs2 = c(2L, 1L, 2L, 0L, 1L, 1L), rs3 = c(1L, 0L, 
1L, 2L, 2L, 2L), rs4 = c(1L, 1L, 1L, 1L, 0L, 2L), rs5 = c(1L, 
0L, 0L, 0L, 1L, 2L), rs6 = c(1L, 1L, 1L, 1L, 1L, 2L), rs7 = c(0L, 
0L, 0L, 0L, 0L, 0L), rs8 = c(0L, 0L, 1L, 0L, 2L, 1L), rs9 = c(0L, 
0L, 2L, 1L, 1L, 0L), rs10 = c(2L, 0L, 0L, 2L, 2L, 1L), rs11 = c(0L, 
1L, 1L, 0L, 1L, 1L), rs12 = c(1L, 2L, 0L, 1L, 2L, 2L), rs13 = c(0L, 
2L, 0L, 0L, 0L, 0L), rs14 = c(1L, 1L, 1L, 1L, 2L, 2L), rs15 = c(1L, 
2L, 1L, 1L, 0L, 1L), rs16 = c(0L, 2L, 1L, 2L, 2L, 1L), rs17 = c(0L, 
2L, 1L, 1L, 2L, 2L), rs18 = c(1L, 2L, 2L, 1L, 1L, 1L), rs19 = c(1L, 
1L, 0L, 1L, 2L, 2L), rs20 = c(2L, 1L, 0L, 2L, 2L, 1L), rs21 = c(1L, 
2L, 2L, 1L, 1L, 0L), rs22 = c(1L, 1L, 2L, 2L, 0L, 1L), rs23 = c(2L, 
0L, 2L, 1L, 1L, 1L), rs24 = c(0L, 0L, 0L, 2L, 2L, 2L), rs25 = c(2L, 
2L, 1L, 1L, 0L, 1L), rs26 = c(1L, 1L, 0L, 2L, 0L, 1L), rs27 = c(1L, 
1L, 1L, 1L, 0L, 1L), rs28 = c(0L, 1L, 1L, 2L, 0L, 2L), rs29 = c(2L, 
2L, 2L, 2L, 1L, 2L), rs30 = c(0L, 2L, 1L, 2L, 1L, 0L)), row.names = c(NA, 
6L), class = "data.frame")```

Upvotes: 1

Views: 2163

Answers (2)

AlexB
AlexB

Reputation: 3269

Probably you want something like this:

lapply(1:30, function(i) glm(as.formula(paste0('casecontrol ~ ', 'rs', i)), data = mydata, family = binomial))

which will execute 30 logistic regressions with the selected predictor.

Instead of hard coding the overall number of predictors, you can use: sum(grepl('rs', names(mydata))), which will return 30.

You can use tidy function from broom package to get the summary in a tidy format.

purrr::map_dfr(1:30, function(i) data.frame(model = i, tidy(glm(as.formula(paste0('casecontrol ~ ', 'rs', i)), data = mydata, family = binomial))))

or you can do this in a more dynamic way:

names(mydata)[grepl('rs', names(mydata))] -> pred #get all predictors that contain 'rs'

purrr::map_dfr(1:length(pred), 
               function(i) data.frame(model = i, 
                                      tidy(glm(as.formula(paste0('casecontrol ~ ', pred[i])), data = mydata, family = binomial))))

If you want to include another variable, you simply need to adjust the pred vector.

c(pred, paste0(pred, ' + age')) -> pred #interaction between rs drivers and age

or

c(pred, paste0(pred, ' + age + sex')) -> pred #interaction between rs drivers age and sex

Upvotes: 2

Iram shahzadi
Iram shahzadi

Reputation: 43

you can do something like this

outcome<-mydata %>% select("casecontrol")  #select outcome

features <- mydata %>% select("rs1":"rs30") #select features

features_names<-data.frame(colnames(features)) #get feature names

for(i in 1:nrow(features_names))     # loop over length of features_name
{selected=features[,i,drop=FALSE]    #get each feature 
total <- cbind(selected, response)   # combine with outcome
glm.fit <- glm(casecontrol ~ ., data = total, family = binomial("logit")) 
summary(glm.fit)
}

Upvotes: 0

Related Questions