Reputation: 4636
Lets say I have multiple models (two survival and logistic here for convenience) and I just want to look at sex
estimates.
library(survival)
data(colon)
sn <- Surv(colon$time, colon$status)
fit <- coxph(sn ~ sex + perfor + age, data = colon)
fit1 <- coxph(sn ~ sex + perfor + surg + rx , data = colon)
fit2 <- glm(factor(status) ~ sex + age, data=colon, family=binomial(link = "logit"))
fit3 <- glm(factor(status) ~ sex + age + nodes, data=colon, family=binomial(link = "logit"))
I want the following dataframe (df
) as my final output with model name and estimate. I also want a modified version of df
, df2
, so that there is a grouping effect where logistic and survival models are in different columns. The fact that they are ordered (2 survival followed by 2 logistic) makes this easier. Is there a way to extend this to a more general form and pre-define the layout of the dataset, lets say we had five models of survival/logistic/lme so we need 5 rows x 3 column type dataset.
> df
model estimate
1 fit 0.97 (95 % CI 0.85 to 1.10)
2 fit1 0.94 (95 % CI 0.83 to 1.07)
3 fit2 0.97 (95 % CI 0.81 to 1.17)
4 fit3 0.98 (95 % CI 0.81 to 1.18)
and also
> df2
model_survival sur_estimate model_logistic logistic_estimate
1 fit 0.97 (95 % CI 0.85 to 1.10) fit2 0.97 (95 % CI 0.81 to 1.17)
2 fit1 0.94 (95 % CI 0.83 to 1.07) fit3 0.98 (95 % CI 0.81 to 1.18)
My attempt so far:
I have used lapply
which I think is better than a for loop
and have pretty much worked it out but I would like the section that is outside lapply, to be inside so it is more automated if I had more models etc. See below.
mylist<-list(fit,fit1,fit2,fit3)
results <- list()
results <- lapply(mylist, function(x) {
sprintf("%.2f (95 %% CI %.2f to %.2f)",
exp(coef(x))["sex"],
exp(confint(x)[,1])["sex"],
exp(confint(x)[,2])["sex"])
})
results <- do.call(rbind.data.frame, results)
I can make results
to look like df
by doing the following but I would like this inside lapply
so I don't need to get names again, just use them from mylist etc. but results$model<-names(mylist)
doesn't work.
colnames(results)[1]<-"estimate"
results$model<-c("fit","fit1","fit2","fit3")
To get df2
I could do some long to wide conversion but can I pre-define the layout and column names inside lapply
(I know I will probably need two separate lapply
- one for df
and df2
). Thanks.
Upvotes: 1
Views: 253
Reputation: 886938
We can use map
with stack
library(tidyverse)
out <- mget(ls(pattern = "fit\\d*")) %>%
map(~sprintf("%.2f (95 %% CI %.2f to %.2f)",
exp(coef(.x))["sex"],
exp(confint(.x)[,1])["sex"],
exp(confint(.x)[,2])["sex"])) %>%
stack %>%
select(model = ind, estimate = values)
out
# model estimate
#1 fit 0.97 (95 % CI 0.85 to 1.10)
#2 fit1 0.94 (95 % CI 0.83 to 1.07)
#3 fit2 0.97 (95 % CI 0.81 to 1.17)
#4 fit3 0.98 (95 % CI 0.81 to 1.18)
From the 'out', we can get the second output
library(data.table)#using dcast as it can take multiple value.vars
out %>%
group_by(group = rep(c("model_survival", "model_logistic"), each = 2)) %>%
mutate(rn = row_number()) %>%
as.data.table %>%
dcast(., rn ~ group, value.var = c('model', 'estimate')) %>%
select(-rn)
# model_model_logistic model_model_survival estimate_model_logistic estimate_model_survival
#1: fit2 fit 0.97 (95 % CI 0.81 to 1.17) 0.97 (95 % CI 0.85 to 1.10)
#2: fit3 fit1 0.98 (95 % CI 0.81 to 1.18) 0.94 (95 % CI 0.83 to 1.07)
Upvotes: 1