slightlynybbled
slightlynybbled

Reputation: 2645

scipy.stats.weibull_min.fit() - how to deal with right-censored data?

Non-Censored (Complete) Dataset

I am attempting to use the scipy.stats.weibull_min.fit() function to fit some life data. Example generated data is contained below within values.

values = np.array(
    [10197.8, 3349.0, 15318.6, 142.6, 20683.2, 
    6976.5, 2590.7, 11351.7, 10177.0, 3738.4]
)

I attempt to fit using the function:

fit = scipy.stats.weibull_min.fit(values, loc=0)

The result:

(1.3392877335100251, -277.75467055900197, 9443.6312323849124)

Which isn't far from the nominal beta and eta values of 1.4 and 10000.

Right-Censored Data

The weibull distribution is well known for its ability to deal with right-censored data. This makes it incredibly useful for reliability analysis. How do I deal with right-censored data within scipy.stats? That is, curve fit for data that has not experienced failures yet?

The input form might look like:

values = np.array(
    [10197.8, 3349.0, 15318.6, 142.6, np.inf, 
    6976.5, 2590.7, 11351.7, 10177.0, 3738.4]
)

or perhaps using np.nan or simply 0.

Both of the np solutions are throwing RunTimeWarnings and are definitely not coming close to the correct values. I using numeric values - such as 0 and -1 - removes the RunTimeWarning, but the returned parameters are obviously flawed.

Other Softwares

In some reliability or lifetime analysis softwares (minitab, lifelines), it is necessary to have two columns of data, one for the actual numbers and one to indicate if the item has failed or not yet. For instance:

values = np.array(
    [10197.8, 3349.0, 15318.6, 142.6, 0, 
    6976.5, 2590.7, 11351.7, 10177.0, 3738.4]
)

censored = np.array(
    [True, True, True, True, False,
    True, True, True, True, True]
)

I see no such paths within the documentation.

Upvotes: 2

Views: 2627

Answers (1)

Derryn Knife
Derryn Knife

Reputation: 104

Old question but if anyone comes across this, there is a new survival analysis package for python, surpyval, that handles this, and other cases of censoring and truncation. For the example you provide above it would simply be:

import surpyval as surv
values = np.array([10197.8, 3349.0, 15318.6, 142.6, 6976.5, 2590.7, 11351.7, 10177.0, 3738.4])

# 0 = failed, 1 = right censored
censored = np.array([0, 0, 0, 0, 0, 1, 1, 1, 0])

model = surv.Weibull.fit(values, c=censored)
print(model.params)

(10584.005910580288, 1.038163987652635)

You might also be interested in the Weibull plot:

model.plot(plot_bounds=False)

Weibull plot

Full disclosure, I am the creator of surpyval

Upvotes: 3

Related Questions