Given the distribution of the data, how to infer the parameters?

Question

I have data, only discrete data points, and know how the data is distributed, i.e:

y = w * gamma.pdf(x, alpha1, scale=scale1) + (1-w) * gamma.pdf(x, alpha2, scale=scale2)

How do you accurately infer these five parameters? My code are as follows:

import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
from scipy.optimize import curve_fit
from scipy.stats import gaussian_kde,gamma
data1 = np.random.gamma(4, 1, 200)
data2 = np.random.gamma(6,2, 200)
data = np.concatenate((data1, data2))
x=np.linspace(0,np.max(data)+2,800)
y=gaussian_kde(data)(x) 
initial_params = [0.5, 2, 1, 2, 1]  
params, params_covariance = curve_fit(two_gamma, x, y, p0=initial_params, maxfev=50000)
w, alpha1, scale1, alpha2, scala2 = params
plt.figure(figsize=(10, 6))
sns.histplot(data, bins=20, kde=False, color='y', label='Data density', alpha=0.5, stat='probability')
plt.plot(x, y, marker='o', linestyle='', markersize=1, label='Data distribution')
y_fit=w*gamma.pdf(x, alpha1, scale=scale1)+(1-w)*gamma.pdf(x, alpha2, scale=scala2)
plt.plot(x, y_fit, 'r-', linewidth=1, alpha=0.7,label='Mixture gamma distribution')
plt.legend(fontsize=8, loc='upper right')
plt.title("Expression distribution of gamma mixture")
plt.xlabel("Expression")

I use the curve_fit function to fit the form that does not yield a double peak, and I hope to be able to approximate the parameters of the function precisely. The use of neural networks and other methods are acceptable.

Given the distribution of the data, how to infer the parameters?

Answers (1)

Related Questions