Reputation: 69
I want to do logistic regressions for several (n = 30) SNPs (coded as 0,1,2) as predictors and a casecontrol variable (0,1) as an outcome. As few of those rs are correlated, I cannot put all rs# in one model but have to run one at a time regression for each i.e., I cannot simply plus them together in one model like rs1 + rs2 + rs3 and so on....I need to have each regressed separately like below;
test1 = glm(casecontrol ~ rs1, data = mydata, family=binomial)
test2 = glm(casecontrol ~ rs2, data = mydata, family=binomial)
test3 = glm(casecontrol ~ rs3, data = mydata, family=binomial)
While I can run all the above regressions separately, is there a way to loop them together so I could get a summary() of all tests in one go?
I will have to adjust for age and sex too, but that would come after I run an unadjusted loop.
My data head from dput(head(mydata))
for example;
structure(list(ID = 1:6, sex = c(2L, 1L, 2L, 1L, 1L, 1L), age = c(52.4725405022036,
58.4303618001286, 44.5300065923948, 61.4786037395243, 67.851808819687,
39.7451378498226), bmi = c(31.4068751083687, 32.0614937413484,
23.205021363683, 29.1445372393355, 32.6287483051419, 20.5887741968036
), casecontrol = c(0L, 1L, 0L, 1L, 1L, 1L), rs1 = c(1L, 0L, 2L,
2L, 1L, 2L), rs2 = c(2L, 1L, 2L, 0L, 1L, 1L), rs3 = c(1L, 0L,
1L, 2L, 2L, 2L), rs4 = c(1L, 1L, 1L, 1L, 0L, 2L), rs5 = c(1L,
0L, 0L, 0L, 1L, 2L), rs6 = c(1L, 1L, 1L, 1L, 1L, 2L), rs7 = c(0L,
0L, 0L, 0L, 0L, 0L), rs8 = c(0L, 0L, 1L, 0L, 2L, 1L), rs9 = c(0L,
0L, 2L, 1L, 1L, 0L), rs10 = c(2L, 0L, 0L, 2L, 2L, 1L), rs11 = c(0L,
1L, 1L, 0L, 1L, 1L), rs12 = c(1L, 2L, 0L, 1L, 2L, 2L), rs13 = c(0L,
2L, 0L, 0L, 0L, 0L), rs14 = c(1L, 1L, 1L, 1L, 2L, 2L), rs15 = c(1L,
2L, 1L, 1L, 0L, 1L), rs16 = c(0L, 2L, 1L, 2L, 2L, 1L), rs17 = c(0L,
2L, 1L, 1L, 2L, 2L), rs18 = c(1L, 2L, 2L, 1L, 1L, 1L), rs19 = c(1L,
1L, 0L, 1L, 2L, 2L), rs20 = c(2L, 1L, 0L, 2L, 2L, 1L), rs21 = c(1L,
2L, 2L, 1L, 1L, 0L), rs22 = c(1L, 1L, 2L, 2L, 0L, 1L), rs23 = c(2L,
0L, 2L, 1L, 1L, 1L), rs24 = c(0L, 0L, 0L, 2L, 2L, 2L), rs25 = c(2L,
2L, 1L, 1L, 0L, 1L), rs26 = c(1L, 1L, 0L, 2L, 0L, 1L), rs27 = c(1L,
1L, 1L, 1L, 0L, 1L), rs28 = c(0L, 1L, 1L, 2L, 0L, 2L), rs29 = c(2L,
2L, 2L, 2L, 1L, 2L), rs30 = c(0L, 2L, 1L, 2L, 1L, 0L)), row.names = c(NA,
6L), class = "data.frame")```
Upvotes: 1
Views: 2163
Reputation: 3269
Probably you want something like this:
lapply(1:30, function(i) glm(as.formula(paste0('casecontrol ~ ', 'rs', i)), data = mydata, family = binomial))
which will execute 30 logistic regressions with the selected predictor.
Instead of hard coding the overall number of predictors, you can use:
sum(grepl('rs', names(mydata)))
, which will return 30.
You can use tidy
function from broom
package to get the summary in a tidy format.
purrr::map_dfr(1:30, function(i) data.frame(model = i, tidy(glm(as.formula(paste0('casecontrol ~ ', 'rs', i)), data = mydata, family = binomial))))
or you can do this in a more dynamic way:
names(mydata)[grepl('rs', names(mydata))] -> pred #get all predictors that contain 'rs'
purrr::map_dfr(1:length(pred),
function(i) data.frame(model = i,
tidy(glm(as.formula(paste0('casecontrol ~ ', pred[i])), data = mydata, family = binomial))))
If you want to include another variable, you simply need to adjust the pred
vector.
c(pred, paste0(pred, ' + age')) -> pred #interaction between rs drivers and age
or
c(pred, paste0(pred, ' + age + sex')) -> pred #interaction between rs drivers age and sex
Upvotes: 2
Reputation: 43
you can do something like this
outcome<-mydata %>% select("casecontrol") #select outcome
features <- mydata %>% select("rs1":"rs30") #select features
features_names<-data.frame(colnames(features)) #get feature names
for(i in 1:nrow(features_names)) # loop over length of features_name
{selected=features[,i,drop=FALSE] #get each feature
total <- cbind(selected, response) # combine with outcome
glm.fit <- glm(casecontrol ~ ., data = total, family = binomial("logit"))
summary(glm.fit)
}
Upvotes: 0