Reputation: 143
I am attempting to write a function which should solve the integral
The data points that are used follow a Power-law proces and can become quite large numbers. The data is simulated using the code,
def simulate_data_constant(beta1, beta2, lam1, sets, total_failures, change):
betha1 = beta1
betha2 = beta2
lambda1 = lam1
lambda2 = lam1
n_traj = sets
n = total_failures
x = change
t1 = np.zeros((n_traj, n))
for k in range(n_traj):
t1[k, 0] = (-np.log(np.random.uniform(0,1))/lambda1)**(1/betha1)
for j in range(1, x):
t1[k, j] = (-np.log(np.random.uniform(0,1))/lambda1 + (t1[k, j-1])**betha1)**(1/betha1)
for j in range(x, n):
t1[k, j] = (-np.log(np.random.uniform(0,1))/lambda2 + (t1[k, j-1])**betha2)**(1/betha2)
return t1
To compute the integral I first used the function tplquad
but Python gave me a message saying that the integral does not converge. I therefore attempted using Monte Carlo Integration with the following code:
def K1_log(tau, beta1, beta2, n, eps, v, T):
K1_threshold = 700
val = (n+eps)*np.log(np.power(tau, beta1)+np.power(T, beta2)-np.power(tau, beta2)+v)
#return np.where(log_K1_values >= K1_threshold, K1_threshold, log_K1_values)
return val
def q_j_log(j, beta1, beta2, times):
prod1_log = beta1*np.sum(np.log(times[:j]))
prod2_log = beta2*np.sum(np.log(times[j:]))
return prod1_log + prod2_log
def integral1_log(beta1, beta2, tau, n, eps, v, T, times):
K1_values = K1_log(tau, beta1, beta2, n, eps, v, T)
log_prod_term = beta2*np.sum(np.log(times))
print(K1_values-log_prod_term)
return 2*np.log(2) - 3*(np.log(1+beta1) + np.log(1+beta2)) + n*np.log(beta2) + log_prod_term - K1_values - np.log(T)
def integral2_log(beta1, beta2, tau, j, n, eps, v, T, times):
K1_values = K1_log(tau, beta1, beta2, n, eps, v, T)
q_j_values = q_j_log(j, beta1, beta2, times)
return 2*np.log(2) - 3*(np.log(1+beta1) + np.log(1+beta2)) + j*np.log(beta1) + (n-j)*np.log(beta2) + q_j_values - K1_values - np.log(T)
def integral3_log(beta1, beta2, tau, n, eps, v, T, times):
K1_values = K1_log(tau, beta1, beta2, n, eps, v, T)
log_prod_term = beta1*np.sum(np.log(times))
return 2*np.log(2) - 3*(np.log(1+beta1) + np.log(1+beta2)) + n*np.log(beta1) + log_prod_term - K1_values - np.log(T)
def Bayes(times_mat, sets, total_failures):
n_traj = sets
n = total_failures
v = 1
eps = 0.1
num_samples = 1000
for k in range(n_traj):
times = times_mat[k]
T = math.ceil(times[-1])
samples_tau = np.random.uniform(0, times[0], num_samples)
samples_beta1 = np.random.uniform(0, 3, num_samples)
samples_beta2 = np.random.uniform(0, 3, num_samples)
integral_1_values_log = integral1_log(samples_beta1, samples_beta2, samples_tau, n, eps, v, T, times)
integral_1_values = np.array([np.exp(log_val) for log_val in integral_1_values_log])
A1 = np.mean(integral_1_values)*times[0]
A2 = 0
for j in range(1, n-1):
samples_tau = np.random.uniform(times[j], times[j+1])
integral_2_values_log = integral2_log(samples_beta1, samples_beta2, samples_tau, j, n, eps, v, T, times)
integral_2_values = np.array([np.exp(log_val) for log_val in integral_2_values_log])
A2 += np.mean(integral_2_values) * (times[j+1]-times[j])
samples_tau = np.random.uniform(times[-1], T, num_samples)
integral_3_values_log = integral3_log(samples_beta1, samples_beta2, samples_tau, n, eps, v, T, times)
integral_3_values = np.array([np.exp(log_val) for log_val in integral_3_values_log])
A3 = np.mean(integral_3_values) * (T-times[-1])
A = A1 + A2 + A3
The implementation above includes a logarithmic transformation of the calculations. The values of x_i were too large resulting in inf
values. This works for some cases of the simulated data but for other cases I still get inf
as an outcome. Is there a different way I can compute this integral, which would work for all simulated data?
Upvotes: 2
Views: 149
Reputation: 20130
First, you could have problem in your code which is
-np.log(np.random.uniform(0,1))
Why don't you use exponential distribution sampling? Or at least
replace np.random.uniform(0,1)
with 1-np.random.uniform(0,1)
. All the
sampling in the simulate_data_constants uses exponential, not a power law.
You could try to sample β1 or β2 from their PDFs while doing MC integration. THis is general principle in MC - if you have integral
∫ f(x) g(x) dx
,
And g(x) is (could be made into) some PDF(x), you could sample from this PDF and just compute mean of f(x) as a value of integral
This will bring serious improvements to convergence because most samples would concentrate closer to 0. Tau would be still a linear sampling.
So rewrite your integrals
I = ∫∫∫ PDF(β1) PDF(β2) Leftovers12 ...
PDF(β1) = 2/(1 + β1)3
PDF(β2) = 2/(1 + β2)3
Sampling is simple:
β1,2 = √(1/U01) - 1, where U01 is uniform number in (0...1] range
Some code, Python 3.11 Windows x64:
import numpy as np
from matplotlib import pyplot as plt
def BetaPDF(beta):
return 2.0/(1.0 + beta)**3
def sampleBetaPDF(rng, N):
u01 = rng.random(size=N)
u01 = 1.0 - u01 # to make [0...1) -> (0...1] range to avoid divbyzero
beta = np.sqrt(1.0 / u01) - 1.0
return beta
rng = np.random.default_rng(122807528840384100672342137672332424406)
N = 100000
rnge = (0.0, 10.0)
bins = np.linspace(start = rnge[0], stop = rnge[1], num=101)
data = sampleBetaPDF(rng, N)
hist, bin_edges = np.histogram(data, bins=bins, density=True)
print(hist.sum())
print(np.sum(hist * np.diff(bin_edges)))
fig = plt.figure(figsize =(10, 7))
plt.hist(data, bins = bins, density = True)
beta = 0.5 * (bin_edges[:-1] + bin_edges[1:])
bpdf = BetaPDF(beta)
plt.scatter(beta, bpdf, c='#9467bd')
plt.title("Beta Sampling vs PDF")
plt.show()
And it will produce graph below.
Upvotes: 1