Fit lognormal function to count data

Question

I have a set of particle size data that are binned by size and normalized by the bin width. I'd like to fit a lognormal distribution function to this data, but I'm having some problems. Most software (scipy.stats.lognormal.fit, for example) expects the raw data, and there doesn't seem to be a way to do the same fit with the already-binned data.

What would be the best way to fit this data to a lognormal distribution? I made a csv file with the data available: https://drive.google.com/file/d/1wxsJuyu7rv0VQBHAYyreZmQqKiIZ7dz5/view?usp=sharing

lsterzinger · Accepted Answer

The best way I found to do this was to use scipy.optimize.curve_fit. Some sample code is below, using the example data above:

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
from scipy.optimize import curve_fit

# Define lognormal PDF as function of x, mu, and sigma
def lognorm_fit(x, mu, sigma):
    y = (1/(sigma*x*np.sqrt(2*np.pi))) * np.exp(-1* ((np.log(x) - mu)**2/(2*sigma**2)))
    return y

# Open data
df = pd.read_csv('./data.csv')

# Read data, using the lower bin limit as x
count = df['count']
x = df['lower_bin_limit']
width = df['upper_bin_limit'] - df['lower_bin_limit']

# Divide by bin width and normalize by total count
y = count/width/(count.sum())
plt.plot(x,y)

# Use curve_fit to find the two parameters in lognom_fit function
# Curve fit returns a tuple of (mu, sigma) and the covariance,
# Which isn't needed.
(mu,sigma), _ = curve_fit(lognorm_fit, xdata=x, ydata=y)
print(mu, sigma)

# Generate lognorm PDF using values of mu, sigma
yy = lognorm_fit(x, mu, sigma)

# Plot Results
plt.plot(x,y)
plt.plot(x, yy)
plt.xscale('log')
plt.show()

Fit lognormal function to count data

Answers (1)

Related Questions