Reputation: 155
I thought this would be a trivial thing to do but I still have some trouble adjusting to writing code instead of pointing and clicking on a spreadsheet.
month = as.integer(c(1,2,3,4,5,6,7,8,9,10,11,12))
remaining = c(1000,925,852,790,711,658,601,567,530,501,485,466)
left = c(75, 73, 62, 79, 53, 57, 34, 37, 29, 16, 19, 0)
KPdata = data.frame(month, remaining, left)
> KPdata
month remaining left
1 1 1000 75
2 2 925 73
3 3 852 62
4 4 790 79
5 5 711 53
6 6 658 57
7 7 601 34
8 8 567 37
9 9 530 29
10 10 501 16
11 11 485 19
12 12 466 12
How do I calculate the Kaplan-Meier survival function at each month? Note that I want to do this manually, I am aware that there are packages which will do it for me.
Upvotes: 1
Views: 993
Reputation: 349
I was so taken with the question and @bouncyball's inspiring answer, I thought I'd add my ha'penny worth with an attempt at handling censoring. This is intended to be in the spirit of the original question - doing things 'handraulically' to develop key insights.
## rename remaining -> survived; left -> died
month = as.integer(c(1,2,3,4,5,6,7,8,9,10,11,12))
survived = c(1000,925,852,790,711,658,601,567,530,501,485,466)
died = c(75, 73, 62, 79, 53, 57, 34, 37, 29, 16, 19, 0)
## arbitrary censoring @ 10 per time period
censored <- c(0, rep(10,11))
KPdata3 = data.frame(month, at.risk, censored, died, survived)
## define those at risk <= those who survived
## awful bit of R fiddling for (something simple like) offsetting the index in base R
len <- length(month)
at.risk <- c(survived[1],
survived[-len] - died[-len] - cumsum(censored[-len]) )
## note use of cumsum()
## censoring uses at risk, rather than survived/remained
KPdata3$KM_increment <- (KPdata3$at.risk - KPdata3$died)/ KPdata3$at.risk
## code credit to @bouncyball
KPdata3$KM_cumulative <- cumprod(KPdata3$KM_increment)
KPdata3
Gives this.....
month at.risk censored died survived KM_increment KM_cumulative
1 1 1000 0 75 1000 0.9250000 0.9250000
2 2 925 10 73 925 0.9210811 0.8520000
3 3 842 10 62 852 0.9263658 0.7892637
4 4 770 10 79 790 0.8974026 0.7082873
5 5 681 10 53 711 0.9221733 0.6531636
6 6 618 10 57 658 0.9077670 0.5929203
7 7 551 10 34 601 0.9382940 0.5563336
8 8 507 10 37 567 0.9270217 0.5157333
9 9 460 10 29 530 0.9369565 0.4832197
10 10 421 10 16 501 0.9619952 0.4648551
11 11 395 10 19 485 0.9518987 0.4424949
12 12 366 10 0 466 1.0000000 0.4424949
Setting rep(0,11)
gives the same answer as @bouncyball's.
Upvotes: 0
Reputation: 10761
I think this is what you're trying to do. We use lag
and cumprod
to get a manual KM estimator:
KPdata$KM_init <- lag((KPdata$remaining - KPdata$left) / KPdata$remaining)
KPdata[1,ncol(KPdata)] <- 1
KPdata$KM_final <- cumprod(KPdata$KM_init)
KPdata
month remaining left KM_init KM_final
1 1 1000 75 1.0000000 1.000
2 2 925 73 0.9250000 0.925
3 3 852 62 0.9210811 0.852
4 4 790 79 0.9272300 0.790
5 5 711 53 0.9000000 0.711
6 6 658 57 0.9254571 0.658
7 7 601 34 0.9133739 0.601
8 8 567 37 0.9434276 0.567
9 9 530 29 0.9347443 0.530
10 10 501 16 0.9452830 0.501
11 11 485 19 0.9680639 0.485
12 12 466 0 0.9608247 0.466
Alternatively, I think there's a different form of a KM estimator that looks like this (note that I've added a row corresponding to month = 0
):
month = as.integer(c(0,1,2,3,4,5,6,7,8,9,10,11,12))
remaining = c(1000,1000,925,852,790,711,658,601,567,530,501,485,466)
left = c(0,75, 73, 62, 79, 53, 57, 34, 37, 29, 16, 19, 0)
KPdata2 = data.frame(month, remaining, left)
KPdata2$KM_init <- (KPdata2$remaining - KPdata2$left) / KPdata2$remaining
KPdata2$KM_final <- cumprod(KPdata2$KM_init)
KPdata2
month remaining left KM_init KM_final
1 0 1000 0 1.0000000 1.000
2 1 1000 75 0.9250000 0.925
3 2 925 73 0.9210811 0.852
4 3 852 62 0.9272300 0.790
5 4 790 79 0.9000000 0.711
6 5 711 53 0.9254571 0.658
7 6 658 57 0.9133739 0.601
8 7 601 34 0.9434276 0.567
9 8 567 37 0.9347443 0.530
10 9 530 29 0.9452830 0.501
11 10 501 16 0.9680639 0.485
12 11 485 19 0.9608247 0.466
13 12 466 0 1.0000000 0.466
Upvotes: 3