Reputation: 1511
I am trying to replicate the Kaplan Meier table that is figure 1 here. The figure is:
This is the code I wrote:
# Python code to create the above Kaplan Meier curve
from lifelines import KaplanMeierFitter
import pandas as pd
df = pd.DataFrame({
'T':[0,0,0,0,0,0,2.5,2.5,2.5,2.5,2.5,4,4,4,4,4,5,5,5,6,6],
'E':[0,0,0,0,0,0,1,0,0,0,0,1,1,0,0,0,1,0,0,0,0],
})
## create a kmf object
kmf = KaplanMeierFitter()
## Fit the data into the model
kmf.fit(df['T'], df['E'],label='Kaplan Meier Estimate')
## Create an estimate
kmf.plot(ci_show=False)
My output plot is different (see the scale):
When I print the survival function, it is different:
Kaplan Meier Estimate
timeline
0.0 1.0000
2.5 0.9375
4.0 0.7500
5.0 0.6000
6.0 0.6000
I presume I didn't translate the data into a dataframe properly (possibly?). I tried to mess around with the dataframe, adding the 1 event to the start and end of the time frame, but it didn't matter. Can someone show me how to replicate the example I'm trying to work on?
Upvotes: 1
Views: 1232
Reputation: 1726
The comment by @Arne is correct. There are 6 subjects, so there should only be 6 elements in your T
and E
vector. Recall that each element of these vectors is a single subject. T
represents how long that subject was observed for, and E
represents if the subject's "death" was observed or not.
Somewhat related, you can make convert from survival table to T, E vectors using a utility function in the lifelines library:
from lifelines.utils import survival_events_from_table
df = pd.DataFrame([
{"observed": 1, "censored": 0, "time": 2.5},
{"observed": 2, "censored": 0, "time": 4.},
{"observed": 1, "censored": 0, "time": 5.},
{"observed": 0, "censored": 2, "time": 6.},
])
df = df.set_index("time")
T, E, W = survival_events_from_table(df)
kmf = KaplanMeierFitter().fit(T, E, weights=W)
kmf.plot()
Upvotes: 1