Slowat_Kela
Slowat_Kela

Reputation: 1511

How to read in a pandas data frame to a kaplan meier curve?

I am trying to replicate the Kaplan Meier table that is figure 1 here. The figure is:

enter image description here

This is the code I wrote:

# Python code to create the above Kaplan Meier curve
from lifelines import KaplanMeierFitter
import pandas as pd

df = pd.DataFrame({
                'T':[0,0,0,0,0,0,2.5,2.5,2.5,2.5,2.5,4,4,4,4,4,5,5,5,6,6],
                'E':[0,0,0,0,0,0,1,0,0,0,0,1,1,0,0,0,1,0,0,0,0],
})
## create a kmf object
kmf = KaplanMeierFitter() 

## Fit the data into the model
kmf.fit(df['T'], df['E'],label='Kaplan Meier Estimate')

## Create an estimate
kmf.plot(ci_show=False) 

My output plot is different (see the scale):

here

When I print the survival function, it is different:

          Kaplan Meier Estimate
timeline                       
0.0                      1.0000
2.5                      0.9375
4.0                      0.7500
5.0                      0.6000
6.0                      0.6000

I presume I didn't translate the data into a dataframe properly (possibly?). I tried to mess around with the dataframe, adding the 1 event to the start and end of the time frame, but it didn't matter. Can someone show me how to replicate the example I'm trying to work on?

Upvotes: 1

Views: 1232

Answers (1)

Cam.Davidson.Pilon
Cam.Davidson.Pilon

Reputation: 1726

The comment by @Arne is correct. There are 6 subjects, so there should only be 6 elements in your T and E vector. Recall that each element of these vectors is a single subject. T represents how long that subject was observed for, and E represents if the subject's "death" was observed or not.

Somewhat related, you can make convert from survival table to T, E vectors using a utility function in the lifelines library:

from lifelines.utils import survival_events_from_table

df = pd.DataFrame([
    {"observed": 1, "censored": 0, "time": 2.5},
    {"observed": 2, "censored": 0, "time": 4.},
    {"observed": 1, "censored": 0, "time": 5.},
    {"observed": 0, "censored": 2, "time": 6.},
])
df = df.set_index("time")

T, E, W = survival_events_from_table(df)

kmf = KaplanMeierFitter().fit(T, E, weights=W)
kmf.plot()

Upvotes: 1

Related Questions