jswtraveler
jswtraveler

Reputation: 355

Predicting probability of failure mid life with weibull or ecdf using python

thank you for taking a look at this. I have failure data for tires over a 5 year period. For each tire, I have the start date(day0), the end date(dayn), and the number of miles driven for each day. I used the total miles each car drove to create 2 distributions, one weibull, one ecdf. My hope is to be able to use those distributions to predict the probability a tire will fail 50 miles in the future during the life of the tire. So an an example, if its 2 weeks into the life of a tire, and the total miles is currently 100 miles and the average miles per week is 50. I want to predict the probability it will fail at 150 miles/ in a week.

My thinking is that if I can get the probabilities of all tires active on a given day, I can sum the probability of each tires failure to get a prediction of how many tires will need to be replaced for a given time period in the future of the given day.

My current methodology is to fit a distribution using 3 years of failure data using scipy.weibull_min and statsmodel.ecdf. Then if a tire is currently at 100 miles and we expect the next week to add 50 miles to that I get the cdf of 150.

However, currently after I run this across all tires that are on the road on the date I am predicting from and sum their respective probabilities I get a prediction that is ~50% higher than what the actual number of tire replacements is. My first thought is that it is an issue with my methodology. Does it sound valid or am I doing something dumb?

Upvotes: 0

Views: 1955

Answers (1)

Matthew Reid
Matthew Reid

Reputation: 362

This might be too late of a reply but perhaps it will help someone in the future reading this. If you are looking to make predictions, you need to fit a parametric model (like the Weibull Distribution). The ecdf (Empirical CDF / Nonparametric model) will give you an indication of how well the parametric model fits but it will not allow you to make any future predictions.

To fit the parametric model, I recommend you use the Python reliability library. This library makes it fairly straightforward to fit a parametric model (especially if you have right censored data) and then use the fitted model to make the kind of predictions you are trying to make. Scipy won't handle censored data.

If you have failure data for a population of tires then you will be able to fit a model. The question you asked (about the probability of failure in the next week given that it has survived 2 weeks) is called conditional survival. Essentially you want CS(1|2) which means the probability it will survive 1 more week given that it has survived to week 2. You can find this as the ratio of the survival functions (SF) at week 3 and week 2: CS(1|2) = SF(2+1)/SF(2).

Let's take a look at some code using the Python reliability library. I'll assume we have 10 failure times that we will use to fit our distribution and from that I'll find CS(1|2):

from reliability.Fitters import Fit_Weibull_2P

data = [113, 126, 91, 110, 146, 147, 72, 83, 57, 104] # failure times (in weeks) of some tires from our vehicle fleet
fit = Fit_Weibull_2P(failures=data, show_probability_plot=False)
CS_1_2 = fit.distribution.SF([3])[0] / fit.distribution.SF([2])[0]  # conditional survival
CF_1_2 = 1 - CS_1_2  # conditional failure
print('Probability of failure of any given tire failing in the next week give it has survived 2 weeks:', CF_1_2)

'''
Results from Fit_Weibull_2P (95% CI):
           Point Estimate  Standard Error   Lower CI    Upper CI
Parameter                                                       
Alpha          115.650803        9.168086  99.008075  135.091084
Beta             4.208001        1.059183   2.569346    6.891743
Log-Likelihood: -47.5428956288772 

Probability of failure in the next week given it has survived 2 weeks: 1.7337430857633507e-07
'''

Let's now assume you have 250 vehicles in your fleet, each with 4 tires (1000 tires in total). The probability of 1 tire failing is CF_1_2 = 1.7337430857633507e-07 We can find the probability of X tires failing (throughout the fleet of 1000 tires) like this:

X = [0, 1, 2, 3, 4, 5]
from scipy.stats import poisson
print('n failed     probability')
for x in X:
    PF = poisson.pmf(k=x, mu=CF_1_2 * 1000)
    print(x, '          ', PF)

'''
n failed     probability
0            0.9998266407198806
1            0.00017334425253100934
2            1.502671996412269e-08
3            8.684157279833254e-13
4            3.764024409898102e-17
5            1.305170259061071e-21
'''

These numbers make sense because I generated the data from a weibull distribution with a characteristic life (alpha) of 100 weeks, so we'd expect that the probability of failure during week 3 should be very low.

If you have further questions, feel free to email me directly.

Upvotes: 1

Related Questions