Reputation: 2645
I am attempting to use the scipy.stats.weibull_min.fit()
function to fit some life data. Example generated data is contained below within values
.
values = np.array(
[10197.8, 3349.0, 15318.6, 142.6, 20683.2,
6976.5, 2590.7, 11351.7, 10177.0, 3738.4]
)
I attempt to fit using the function:
fit = scipy.stats.weibull_min.fit(values, loc=0)
The result:
(1.3392877335100251, -277.75467055900197, 9443.6312323849124)
Which isn't far from the nominal beta and eta values of 1.4 and 10000.
The weibull distribution is well known for its ability to deal with right-censored data. This makes it incredibly useful for reliability analysis. How do I deal with right-censored data within scipy.stats
? That is, curve fit for data that has not experienced failures yet?
The input form might look like:
values = np.array(
[10197.8, 3349.0, 15318.6, 142.6, np.inf,
6976.5, 2590.7, 11351.7, 10177.0, 3738.4]
)
or perhaps using np.nan
or simply 0
.
Both of the np
solutions are throwing RunTimeWarning
s and are definitely not coming close to the correct values. I using numeric values - such as 0
and -1
- removes the RunTimeWarning
, but the returned parameters are obviously flawed.
In some reliability or lifetime analysis softwares (minitab, lifelines), it is necessary to have two columns of data, one for the actual numbers and one to indicate if the item has failed or not yet. For instance:
values = np.array(
[10197.8, 3349.0, 15318.6, 142.6, 0,
6976.5, 2590.7, 11351.7, 10177.0, 3738.4]
)
censored = np.array(
[True, True, True, True, False,
True, True, True, True, True]
)
I see no such paths within the documentation.
Upvotes: 2
Views: 2627
Reputation: 104
Old question but if anyone comes across this, there is a new survival analysis package for python, surpyval, that handles this, and other cases of censoring and truncation. For the example you provide above it would simply be:
import surpyval as surv
values = np.array([10197.8, 3349.0, 15318.6, 142.6, 6976.5, 2590.7, 11351.7, 10177.0, 3738.4])
# 0 = failed, 1 = right censored
censored = np.array([0, 0, 0, 0, 0, 1, 1, 1, 0])
model = surv.Weibull.fit(values, c=censored)
print(model.params)
(10584.005910580288, 1.038163987652635)
You might also be interested in the Weibull plot:
model.plot(plot_bounds=False)
Full disclosure, I am the creator of surpyval
Upvotes: 3