Reputation: 61
I'm running the survival analysis for each of expression levels of 566 genes. I did this by combining the function coxph()
with the function lapply
, and it worked well. Right now, due to the large number of genes considered, I am stuck on how to do P-value filtration in order to only preserve genes with significant survival rates, i.e., when P<0.05.
This is the dummy data:
df1 = structure(list(ERLIN2 = structure(c(`TCGA-A1-A0SE-01` = 1L, `TCGA-A1-A0SH-01` = 1L,
`TCGA-A1-A0SJ-01` = 1L), .Label = c("down", "up"), class = "factor"),
BRF2 = structure(c(`TCGA-A1-A0SE-01` = 2L, `TCGA-A1-A0SH-01` = 1L,
`TCGA-A1-A0SJ-01` = 2L), .Label = c("down", "up"), class = "factor"),
ZNF703 = structure(c(`TCGA-A1-A0SE-01` = 2L, `TCGA-A1-A0SH-01` = 1L,
`TCGA-A1-A0SJ-01` = 2L), .Label = c("down", "up"), class = "factor"),
time = c(43.4, 47.21, 13.67), event = c(0, 0, 0)), row.names = c("TCGA-A1-A0SE-01",
"TCGA-A1-A0SH-01", "TCGA-A1-A0SJ-01"), class = "data.frame")
After that, to receive the results, please enter the code lines below:
#library
if(!require(survival)) install.packages('survival')
library('survival')
#run survival analysis
df2=lapply(c("ERLIN2", "BRF2", "ZNF703"),
function(x) {
formula <- as.formula(paste('Surv(time,event)~',as.factor(x)))
coxFit <- coxph(formula, data = df1)
summary(coxFit)
})
From here, I'm trying to do P-value filtration as follows:
for (i in 3){
df2 = df2 %>% subset(df2[[i]]$logtest[3] < 0.05)
}
But it is inefficient! Any help would be aprriciated!
Upvotes: 0
Views: 212
Reputation: 2949
If you are interested in sub-setting the list by any variable (pvalue of logtest in your case), I would suggest the rlist
package
library(rlist)
df3 <- list.filter(df2, logtest[["pvalue"]] < 0.05)
This will filter the list by the conditions specified. The conditions can be nested as well.
Upvotes: 1