Reputation: 717
I have a set of particle size data that are binned by size and normalized by the bin width. I'd like to fit a lognormal distribution function to this data, but I'm having some problems. Most software (scipy.stats.lognormal.fit
, for example) expects the raw data, and there doesn't seem to be a way to do the same fit with the already-binned data.
What would be the best way to fit this data to a lognormal distribution? I made a csv file with the data available: https://drive.google.com/file/d/1wxsJuyu7rv0VQBHAYyreZmQqKiIZ7dz5/view?usp=sharing
Upvotes: 0
Views: 400
Reputation: 717
The best way I found to do this was to use scipy.optimize.curve_fit
. Some sample code is below, using the example data above:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
from scipy.optimize import curve_fit
# Define lognormal PDF as function of x, mu, and sigma
def lognorm_fit(x, mu, sigma):
y = (1/(sigma*x*np.sqrt(2*np.pi))) * np.exp(-1* ((np.log(x) - mu)**2/(2*sigma**2)))
return y
# Open data
df = pd.read_csv('./data.csv')
# Read data, using the lower bin limit as x
count = df['count']
x = df['lower_bin_limit']
width = df['upper_bin_limit'] - df['lower_bin_limit']
# Divide by bin width and normalize by total count
y = count/width/(count.sum())
plt.plot(x,y)
# Use curve_fit to find the two parameters in lognom_fit function
# Curve fit returns a tuple of (mu, sigma) and the covariance,
# Which isn't needed.
(mu,sigma), _ = curve_fit(lognorm_fit, xdata=x, ydata=y)
print(mu, sigma)
# Generate lognorm PDF using values of mu, sigma
yy = lognorm_fit(x, mu, sigma)
# Plot Results
plt.plot(x,y)
plt.plot(x, yy)
plt.xscale('log')
plt.show()
Upvotes: 1