waldi
waldi

Reputation: 31

How to get the confidence interval of a Weibull distribution using Python?

I want to perform a probability Weibull fit with 0.95% confidence bounds by means of Python. As test data, I use fail cycles of a measurement which are plotted against the reliability R(t).

So far, I found a way to perform the Weibull fit, however, I still do not manage to get the confidence bounds. The Weibull plot with the same test data set was already performed with origin, therfore I know which shape I would "expect" for the confidence interval. But I do not understand how to get there.

example for confidence interval

I found information about Weibull confidence intervals on reliawiki(cf. Bounds on Reliability based on Fisher Matrix confidence bounds) and used the description there to calculate the variance and the upper and lower confidence bound (R_U and R_L).

Here is a working code example for my Weibull fit and my confidence bounds with the test data set based on the discription of reliawiki (cf. Bounds on Reliability). For the fit, I used a OLS model fit.

import os, sys
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib as mpl
from scipy.optimize import curve_fit
import math
import statsmodels.api as sm

def weibull_ticks(y, pos):
    return "{:.0f}%".format(100 * (1 - np.exp(-np.exp(y))))

def loglog(x):
    return np.log(-np.log(1 - np.asarray(x)))

class weibull_example(object):

    def __init__(self, dat):
        self.fits = {}
        dat.index = np.arange(1, len(dat) + 1)
        dat.sort_values('data', inplace=True)
        #define yaxis-values
        dat['percentile'] = dat.index*1/len(dat)
        self.data = dat

        self.fit()
        self.plot_data()

    def fit(self):
        #fit the data points with a the OLS model
        self.data=self.data[:-1]
        x0 = np.log(self.data.dropna()['data'].values)
        Y = loglog(self.data.dropna()['percentile'])
        Yx = sm.add_constant(Y)
        model = sm.OLS(x0, Yx)
        results = model.fit()
        yy = loglog(np.linspace(.001, .999, 100))
        YY = sm.add_constant(yy)
        XX = np.exp(results.predict(YY))
        self.eta = np.exp(results.params[0])
        self.beta = 1 / results.params[1]
        self.fits['syx'] = {'results': results, 'model': model,
                            'line': np.row_stack([XX, yy]),
                            'beta': self.beta,
                            'eta': self.eta}

        cov = results.cov_params()
        #get variance and covariance
        self.beta_var = cov[1, 1]
        self.eta_var = cov[0, 0]
        self.cov = cov[1, 0]

    def plot_data(self, fit='yx'):
        dat = self.data
        #plot data points
        plt.semilogx(dat['data'], loglog(dat['percentile']), 'o')
        fit = 's' + fit
        self.plot_fit(fit)

        ax = plt.gca()
        formatter = mpl.ticker.FuncFormatter(weibull_ticks)
        ax.yaxis.set_major_formatter(formatter)
        yt_F = np.array([0.001, 0.01, 0.05, 0.1, 0.2, 0.3, 0.4, 0.5,
                         0.6, 0.7, 0.8, 0.9, 0.95, 0.99])
        yt_lnF = loglog(yt_F)
        plt.yticks(yt_lnF)

        plt.ylim(loglog([.01, .99]))

    def plot_fit(self, fit='syx'):
        dat = self.fits[fit]['line']
        plt.plot(dat[0], dat[1])

        #calculate variance to get confidence bound
        def variance(x):
            return (math.log(x) - math.log(self.eta)) ** 2 * self.beta_var + \
                   (self.beta/self.eta) ** 2 * self.eta_var - \
                   2 * (math.log(x) - math.log(self.eta)) * (-self.beta/self.eta) * self.cov

        #calculate confidence bounds
        def confidence_upper(x):
            return 1-np.exp(-np.exp(self.beta*(math.log(x)-math.log(self.eta)) - 0.95*np.sqrt(variance(x))))
        def confidence_lower(x):
            return 1-np.exp(-np.exp(self.beta*(math.log(x)-math.log(self.eta)) + 0.95*np.sqrt(variance(x))))

        yvals_1 = list(map(confidence_upper, dat[0]))
        yvals_2 = list(map(confidence_lower, dat[0]))

        #plot confidence bounds
        plt.semilogx(dat[0], loglog(yvals_1), linestyle="solid", color="black", linewidth=2,
                 label="fit_u_1", alpha=0.8)
        plt.semilogx(dat[0], loglog(yvals_2), linestyle="solid", color="green", linewidth=2,
                 label="fit_u_1", alpha=0.8)

def main():
    fig, ax1 = plt.subplots()
    ax1.set_xlabel("$Cycles\ til\ Failure$")
    ax1.set_ylabel("$Weibull\ Percentile$")

    #my data points
    data = pd.DataFrame({'data': [1556, 2595, 11531, 38079, 46046, 57357]})
    weibull_example(data)

    plt.savefig("Weibull.png")
    plt.close(fig)

if __name__ == "__main__":
    main()

The confidence bounds in my plot look not like I expected. I tried a lot of different 'variances', just to understand the function and to check, if the problem is just a typing error. Meanwhile, I am convinced that the problem is more general and that I understood something false from the description on reliawiki. Unfortunately, I really do not get what's the problem and I do not know anyone else I can ask. In the internet and on different forums, I did not find an appropriate answer.

That's why I decided to ask this question here. It's the first time I ask a question in a forum. Therefore, I hope that I explained everything sufficiently and that the code example is useful. Thank you very much :)

Upvotes: 2

Views: 2899

Answers (1)

Matthew Reid
Matthew Reid

Reputation: 362

Apologies for the very late answer, but I'll provide it for any future readers. Rather than try implementing this yourself, you may want to consider using a package designed for exactly this called reliability. Here is the example for your use case.

Remember to upvote this answer if it helps you :)

Upvotes: 1

Related Questions