Süleyman Bozkurt
Süleyman Bozkurt

Reputation: 11

Survdiff Analysis of two genes in R

I am trying to analyze 12 genes expression which taken from tumor patients. I need to make log-rank test by survdiff function of R language.

First, I ordered patients depending on median, first group is above the median the other one is below the median. I find p value and Kaplan Meier curve for indiviual gene by this ;

test <- survdiff(Surv(surv, stat) ~ genename > median(genename), data = my.Data)

Now I want to combine two genes and find p value by log-rank test and draw Kaplan Meier curve. These two genes have to be same for being above the median and being below the median.

I make this,

gene1_gene2 <- survdiff(Surv(surv, stat) ~ (gene1 > median(gene1)) + (gene2> median(gene2)), data = my.Data)

                                                      N Observed Expected (O-E)^2/E (O-E)^2/V
gene1> median(gene1)=FALSE, gene2 > median(gene2)=FALSE 70        9     24.5     9.787     17.70
gene1> median(gene1)=FALSE, gene2> median(gene2)=TRUE  19        5      6.8     0.478      0.55
gene1> median(gene1)=TRUE, gene2> median(gene2)=FALSE  19        7      4.0     2.256      2.45
gene1> median(gene1)=TRUE, gene2> median(gene2)=TRUE   69       34     19.7    10.338     16.19

 Chisq= 23  on 3 degrees of freedom, p= 3.98e-05 

It gives 4 results but I need two results which are ;

gene1> median(gene1)=FALSE, gene2 > median(gene2)=FALSE
gene1> median(gene1)=TRUE, gene2> median(gene2)=TRUE

Because these two gives my desire. First is below the median of two and second one is above the median.

How do I do that? Please help me. I hope you understand my problem.

Best

Upvotes: 1

Views: 417

Answers (1)

TomL
TomL

Reputation: 31

You should consider using the mean expression instead of the median expression in your survival analysis since the median will split your cohort in half. From a biological standpoint, cohorts will never have exact 50% events (deaths, metastasis or any other relevant parameter of interest).

That being said, I strongly recommend you the following R code:

survdiff(Surv(my.Dat[,TIME],my.Dat[,EVENTS])~strata(my.Dat[,PREDICTION]),data=my.Dat)

where TIME is the follow-up, EVENTS are patients status (0: no event and 1: event), and PREDICTION is a column where you define expression groups. Consider the following code for filling the PREDICTION column:

my.Dat$PREDICTION=NA
my.Dat$PREDICTION[which(my.Dat$gene1>median(my.Dat$gene1) & my.Dat$gene2>median(my.Dat$gene2))]="UP"
my.Dat$PREDICTION[which(my.Dat$gene1<median(my.Dat$gene1) & my.Dat$gene2<median(my.Dat$gene2))]="DOWN"

This way you set UP patients having high gene1 and gene2 expressions whereas the opposite (low gene1 and gene2) are set to DOWN. Since other combinations (gene1 high/gene2 low and gene1 low/gene2 high) are set to NA, they won't appear in survival metrics.

Cheers.

Upvotes: 1

Related Questions