Reputation: 477
I have a dataset like this: (since this question has been solved, I removed the dataset sample.)
now I need run glm model and retrieve the p-value which is < 0.05 for each variable with outcome: status. I am trying to use loop to achieve it, but I cannot write a correct one.
My thought is first, I create a list to hold all results from glm model code, then, use another list to store all p<-value from the "summary", and then use filter to filter out the records which are >0.05.
for (i in colnames(df2)){
list_glm<-list()
z<-list()
list_glm<-glm(status~i, data =df2, family = binomial())
z<-summary(list_glm)$coefficients[,4]
}
Could someone help to figure it out? Thanks a lot~~!
Upvotes: 1
Views: 343
Reputation: 8110
I would go from wide to long, nest the data, and then run the regressions simultaneously. Then you can map out the p values for the models and filter out the features that give you p < 0.05. It looks like there is 4 models that fit the criteria for your example data.
library(tidyverse)
df |>
pivot_longer(cols = -status) |>
nest(data = -name) |>
mutate(mod = map(data, ~glm(status~value, data = .x, family = binomial())),
p.value = map_dbl(mod, ~summary(.x)$coefficients[2,4])) |>
select(name, p.value) |>
filter(p.value < 0.05)
#> # A tibble: 4 x 2
#> name p.value
#> <chr> <dbl>
#> 1 feature10 0.0370
#> 2 feature34 0.0243
#> 3 feature41 0.0189
#> 4 feature86 0.0498
Upvotes: 1
Reputation: 325
list_glm<-list()
z<-list()
for (i in colnames(df2)[2:length(colnames(df2)]){
formula <- paste0("status ~", i)
list_glm[[i]] <- glm(formula = formula, data =df2, family = binomial())
z[[i]] <-summary( list_glm[[i]])$coefficients[,4]
}
Upvotes: 1