Simon H
Simon H

Reputation: 75

Apply function for survival analysis

I have a dataframe with patient data/ survival and with gene expression data looking like this

# Patients event time Gene_1 ... Gene_100
1 Patient_1 1 356 3455 ... 59393
2 Patient_2 1 1233 6632 ... 43299
3 Patient_3 0 1224 3636 ... 44222
4 Patient_4 0 56 30603 ... 23999
...
100 Patient_100 1 853 ... 25888

What I did was writing a function that is subsetting the gene expression data of each single gene (e.g. Gene_1) into quartiles, and then picking the lowest and highest quartile for comparison in survival analysis:

library(dplyr); library(survival)
quartile_function <- function(dataframe, column_x){
   dataframe$quartile <- ntile(dataframe[ ,column_x], 4)
   dataframe <- subset(dataframe, quartile == 1 | quartile == 4)
   group <- dataframe$quartile
   coxph( Surv(time, event) ~ group, data=dataframe )[['coefficients']]
}

Then I do a cox proportional hazard analysis, where I'm only interested in coef

That all works when I pick the genes column by column, but I struggle to apply this function to every column containing gene expression data in the dataframe.

Any ideas to do this more efficiently? And how can I apply this function easily to every column with Gene expression data, so that I get an overview of the coef for every gene?

Upvotes: 0

Views: 243

Answers (1)

Swapnil
Swapnil

Reputation: 164

If I understand correctly, you want to call quartile_function 100 times and each time you want to pass a different column number = column_x

In that case, the following should work

sapply(seq(a,b), function(x) quartile_function(df,x),simplify = T)

where a=column number corresponding to Gene_1 and b = column number corresponding to Gene_100

Upvotes: 1

Related Questions