Reputation: 45
I have this code that aims to fit the data in here DriveFilelink
The function read_file is utilized to extract information in a structured manner. However, I'm encountering challenges with the Gaussian fit, evident from the discrepancies observed in the Gaussian fit image. This issue appears to stem from the constraints imposed on certain parameters, such as fixing the mean of all the Gaussians and three out of the four amplitudes. Despite these constraints, they are necessary as they are based on available information.
Does anyone know how I can fix this problem? In order to have a better fit.
The code is the following:
import numpy as np
import matplotlib.pyplot as plt
from scipy.signal import find_peaks
from scipy.optimize import curve_fit
from google.colab import drive
import os
# Mount Google Drive
drive.mount('/content/drive')
# Function to read numeric data from a file
def read_numeric_data(file_path):
with open(file_path, 'r', encoding='latin1') as f:
data = []
live_time = None
real_time = None
for line in f:
if 'LIVE_TIME' in line:
live_time = line.strip()
elif 'REAL_TIME' in line:
real_time = line.strip()
elif all(char.isdigit() or char in ".-+e" for char in line.strip()):
row = [float(x) for x in line.split()]
data.extend(row)
return np.array(data), live_time, real_time
file_path = '/content/drive/MyDrive/ProjetoXRF_Manuel/April2024/20240402_In.mca'
data, _, _ = read_numeric_data(file_path)
a = -0.0188026396003431
b = 0.039549044037714
Data = data
# Función para convolucionar múltiples gaussianas
def multi_gaussian(x, *params):
eps = 1e-10
y = np.zeros_like(x)
amplitude_relations = [1,0.5538, 0.1673 , 0.1673*0.5185 ]
meanvalues= [24.210, 24.002 , 27.276 , 27.238]
amplitude = params[0]
for i in range(4):
sigma = params[i * 3 + 2] # Get the fitted standard deviation for this Gaussian
y += amplitude*amplitude_relations[i]* np.exp(-(x - meanvalues[i])**2 / (2 * sigma**2 + eps))
return y
sigma=[]
area=[]
# Función para graficar el espectro de energía convolucionado
def plot_convolved_spectrum(Data, a, b,i, ax=None):
maxim = np.max(Data)
workingarray = np.squeeze(Data)
# Definir puntos máximos
peaks, _ = find_peaks(workingarray / maxim, height=0.1)
peak_values = workingarray[peaks] / maxim
peak_indices = peaks
# Calcular valores de energía correspondientes a los picos
energy_spectrum = a + b * peak_indices
# Preparar datos para convolución
data = workingarray[:885] / maxim
data_y = data / data.max()
data_x = a + b * np.linspace(0, 885, num=len(data_y))
# Ajustar múltiples gaussianas al espectro de energía
peak_indices2, _ = find_peaks(data_y, height=0.1)
peak_amplitudes = [1,0.5538, 0.1673 , 0.1673*0.5185 ]
peak_means = [24.210, 24.002 , 27.276 , 27.238]
peak_sigmas = [0.1] * 4
params_init = list(zip(peak_amplitudes, peak_means, peak_sigmas))
params_init = np.concatenate(params_init)
# Ajustar curva
params, params_cov = curve_fit(multi_gaussian, data_x, data_y, p0=params_init)
# Obtener una interpolación de alta resolución del ajuste
x_fine = np.linspace(data_x.min(), data_x.max(), num=20000)
y_fit = multi_gaussian(x_fine, *params)
# Data Graphic
ax.scatter(data_x, data_y, color='black', marker='o', s=20, label = 'Data' )
y_data_error =np.sqrt(workingarray[:885])
ax.plot(data_x, data_y + y_data_error/maxim, color='black',linestyle='--')
ax.plot(data_x, data_y - y_data_error/maxim, color='black',linestyle='--')
# Fit Graphic
ax.plot(x_fine, y_fit, label="Fit", linewidth=1.5)
# Extraer desviaciones estándar y amplitudes
sigmas_array = params[2::3]
# Calcular sigma promedio
sigma.append(np.mean(sigmas_array))
# Configuración del gráfico
ax.set_xlabel('Energy (KeV)')
ax.set_ylabel('Normalized Data')
ax.legend()
ax.set_title('Convolved Energy Spectrum')
# Imprimir información
print("Standard deviations:", sigmas_array)
fig, ax = plt.subplots()
plot_convolved_spectrum(Data, a, b,1,ax=ax)
ax.set_xlim(22, 28)
plt.show()
Note I've experimented with setting bounds and refining initial guesses to improve the Gaussian fit. However, it's important to note that working with bounds isn't equivalent to fixing parameters. My aim is to maintain a degree of freedom of just two parameters: the amplitude of the first Gaussian and the standard deviation of all Gaussians. While exploring these adjustments, I'm striving to strike a balance between constraint and flexibility to achieve the desired fit accuracy.
The parameters of the Gaussian function are defined as follows: A represents the amplitude, x_0 denotes the mean value, and sigma represents the standard deviation. In the context of analyzing energy spectra, I possess prior knowledge about the mean values and the relationships governing the amplitudes. Consequently, I aim to constrain certain parameters (such as A and mean) based on this known information. Specifically, I intend to fit only the first amplitude and then utilize the known relationships, such as A2 = 0.1673 * A1 for the second peak. And the to fit the corresponding standar deviation.
Why using a sum of gaussians?.
The apparent singularity of the first peak in the plot might suggest a single Gaussian. However, this is not the case. The Gaussian representing this energy peak actually consists of two Gaussians that are summed together. This complexity arises because the energy emission at that level comprises two distinct values, 24.002 and 24.210. And due to the resolution of the experiment we are not able to distinguish them just by seen the plot.
And for all these reasons is that I am trying to fix some parameters.
If someone has issue with the data here it is:
Upvotes: 2
Views: 94
Reputation: 45
This is the most effective method I've discovered for parameter fixation in Python thus far. In this instance, I'm employing the same dataset as before. However, I'm only fitting the parameters for the first Gaussian, without considering the fact that it's actually a combination of two Gaussians.
I've divided the fit into two parts: a rough estimate and a refined one. The refined fit constrains the data within ±3 sigma, as it tends to yield better performance.
import numpy as np
import matplotlib.pyplot as plt
from lmfit import Model
from mpl_toolkits.axes_grid1.inset_locator import zoomed_inset_axes
# Define the Gaussian function
def gaussian(x, amplitude, mean, stddev):
return amplitude * np.exp(-((x - mean) / stddev)**2 / 2)
# Data that is going to be analysed
x_data= a + b * np.linspace(0, 885, 885)
y_data =data[:885]/np.max(data[:885])
#### Bounds for the first gaussian
# Define bounds for the parameters
bounds = ([0, 24.21, 0], [1, 24.22, 2])
y_data_error =np.sqrt(data[:885])/np.max(data[:885])
# Fit the data to the Gaussian function with bounds
params, covariance = curve_fit(gaussian, x_data, y_data, bounds=bounds)
# Create the model
gaussian_model = Model(gaussian)
# Create parameters with initial values
A_value = 1.0 # Initial value for amplitude
mu_value = 24.21 # Initial value for mean
sigma_value = 1.0 # Initial value for standard deviation
params = gaussian_model.make_params(amplitude=A_value, mean=mu_value, stddev=sigma_value)
# Fix amplitude and mean
#params['amplitude'].vary = False
params['mean'].vary = False
# Fit the model to the data
result = gaussian_model.fit(y_data, params, x=x_data)
# Extract the parameters
amplitude = result.params['amplitude'].value
mean = result.params['mean'].value
stddev = result.params['stddev'].value
# Data interpolation
x_fine = np.linspace(x_data.min(), x_data.max(), num=1000)
y_fit= result.eval(x=x_fine)
ylower_limit = ( (24.21 - 3 * stddev)-a )/b
yupper_limit = ( (24.21 + 3 * stddev)-a )/b
# Selecciona los valores dentro del intervalo
y_limitado = y_data[int(ylower_limit):int(yupper_limit)]
x_limitado = a + b*np.linspace(int(ylower_limit), int(yupper_limit), 24)
ylimit_error=np.sqrt(data[int(ylower_limit):int(yupper_limit)])/np.max(data[int(ylower_limit):int(yupper_limit)])
params1, covariance1 = curve_fit(gaussian, x_limitado, y_limitado, bounds=bounds)
# Create the model
gaussian_model = Model(gaussian)
# Create parameters with initial values
A_value1 = 1.0 # Initial value for amplitude
mu_value1 = 24.21 # Initial value for mean
sigma_value1 = 1.0 # Initial value for standard deviation
params1 = gaussian_model.make_params(amplitude=A_value1, mean=mu_value1, stddev=sigma_value1,sigma=ylimit_error)
# Fix amplitude and mean
#params['amplitude'].vary = False
params1['mean'].vary = False
# Fit the model to the data
result1 = gaussian_model.fit(y_limitado, params1, x=x_limitado)
# Extract the parameters
amplitude1 = result1.params['amplitude'].value
mean1 = result1.params['mean'].value
stddev1 = result1.params['stddev'].value
# Data interpolation
x_fine1 = np.linspace(x_limitado.min(), x_limitado.max(), num=1000)
y_fit1 = result1.eval(x=x_fine1)
fig, ax = plt.subplots(figsize=(8, 6))
ax.scatter(x_limitado,y_limitado, color='black', marker='o')
ax.plot(x_fine1,y_fit1,color='blue')
ax.plot(x_fine,y_fit,color='green')
ax.errorbar(x_limitado,y_limitado,yerr=ylimit_error, fmt='o', color='red', label='errorbar')
ax.set_xlabel('x')
ax.set_ylabel('y')
ax.set_title('Gaussian Fit to Data')
ax.set_xlim((24.21 - 3 * stddev, 24.21 + 3 * stddev))
ax.legend()
ax.grid(True)
plt.show()
print("Amplitude:", amplitude)
print("Mean:", mean)
print("Standard Deviation:", stddev)
print("Amplitude:", amplitude1)
print("Mean:", mean1)
print("Standard Deviation:", stddev1)
Edit
Now, I have the capability to fix more parameters, allowing me to precisely adjust the fitting of the sum of the first two Gaussians. However, I'm puzzled by the presence of an additional peak in the fit. I'm unsure of its origin.
import numpy as np
import matplotlib.pyplot as plt
from lmfit import Model
from scipy.optimize import curve_fit
# Define Gaussian function
def gaussian(x, amplitude, mean, stddev):
return amplitude * np.exp(-((x - mean) / stddev)**2 / 2)
# Generate x values for the data
x_data = a + b * np.linspace(0, 885, 885)
# Normalize y data
y_data = data[:885] / np.max(data[:885])
y_data_error = np.sqrt(data[:885]) / np.max(data[:885])
# Fit first Gaussian
bounds = ([1 - 0.0001, 24.21, 0], [1, 24.22, 2]) # Define bounds for the parameters
params, covariance = curve_fit(gaussian, x_data, y_data, bounds=bounds) # Fit Gaussian to data
# Create the model
gaussian_model = Model(gaussian)
# Initial values for parameters
A_value = 1.0
mu_value = 24.21
sigma_value = 1.0
params = gaussian_model.make_params(amplitude=A_value, mean=mu_value, stddev=sigma_value)
params['mean'].vary = False # Fix mean parameter
# Fit the model to the data
result = gaussian_model.fit(y_data, params, x=x_data)
# Extract parameters
amplitude = result.params['amplitude'].value
mean = result.params['mean'].value
stddev = result.params['stddev'].value
# Data interpolation
x_fine = np.linspace(x_data.min(), x_data.max(), num=1000)
y_fit = result.eval(x=x_fine)
# Determine y limits for data subset
ylower_limit = ((24.21 - 3 * stddev) - a) / b
yupper_limit = ((24.21 + 3 * stddev) - a) / b
# Select subset of data within the limits
y_limitado = y_data[int(ylower_limit):int(yupper_limit)]
x_limitado = a + b * np.linspace(int(ylower_limit), int(yupper_limit), 24)
ylimit_error = np.sqrt(data[int(ylower_limit):int(yupper_limit)]) / np.max(data[int(ylower_limit):int(yupper_limit)])
# Constrained fit
bounds_sum = ([0, 24.21, 0, 0, 24.0, 0], [1, 24.22, 1, 1, 24.1, 1]) # Bounds for parameters
# Define function for sum of Gaussians
def sum_of_gaussians(x, amplitude1, mean1, stddev1, amplitude2, mean2, stddev2):
gaussian1 = amplitude1 * np.exp(-((x - mean1) / stddev1)**2 / 2)
gaussian2 = amplitude2 * np.exp(-((x - mean2) / stddev2)**2 / 2)
return gaussian1 + gaussian2
# Fit the sum of Gaussians to the subset of data
params1, covariance1 = curve_fit(sum_of_gaussians, x_limitado, y_limitado, bounds=bounds_sum, sigma=ylimit_error)
# Create the model
multigaussian_model = Model(sum_of_gaussians)
# Initial values for parameters
A_value1 = amplitude
mu_value1 = 24.22
sigma_value1 = 0.1
A_value2 = amplitude * 0.5388
mu_value2 = 24.002
sigma_value2 = 0.1
# Set parameters for the model
params1 = multigaussian_model.make_params(amplitude1=A_value1, mean1=mu_value1, stddev1=sigma_value1,
amplitude2=A_value2, mean2=mu_value2, stddev2=sigma_value2, sigma=ylimit_error)
# Fix certain parameters
params1['amplitude1'].vary = False
params1['mean1'].vary = False
params1['amplitude2'].vary = False
params1['mean2'].vary = False
# Fit the model to the data
result1 = multigaussian_model.fit(y_limitado, params1, x=x_limitado)
# Extract parameters
amplitude1 = result1.params['amplitude1'].value
mean1 = result1.params['mean1'].value
stddev1 = result1.params['stddev1'].value
amplitude2 = result1.params['amplitude2'].value
mean2 = result1.params['mean2'].value
stddev2 = result1.params['stddev2'].value
# Data interpolation
x_fine1 = np.linspace(x_limitado.min(), x_limitado.max(), num=1000)
y_fit1 = result1.eval(x=x_fine1)
# Individual Gaussians
x = np.linspace(min(mean1 - 3 * stddev1, mean2 - 3 * stddev2),
max(mean1 + 3 * stddev1, mean2 + 3 * stddev2), 1000)
y1 = gaussian(x, amplitude1, mean1, stddev1)
y2 = gaussian(x, amplitude2, mean2, stddev2)
# Plotting
fig, ax = plt.subplots(figsize=(8, 6))
ax.scatter(x_limitado, y_limitado, color='black', marker='o')
ax.plot(x_fine1, y_fit1, color='blue')
ax.errorbar(x_limitado, y_limitado, yerr=ylimit_error, fmt='o', color='red', label='errorbar')
ax.plot(x, y1, label='Gaussian 1')
ax.plot(x, y2, label='Gaussian 2')
ax.set_xlabel('x')
ax.set_ylabel('y')
ax.set_title('Gaussian Fit to Data')
ax.set_xlim((24.21 - 3 * stddev, 24.21 + 3 * stddev))
ax.legend()
ax.grid(True)
plt.show()
# Print parameters
print("Amplitude1:", amplitude1)
print("Mean1:", mean1)
print("Standard Deviation1:", stddev1)
print("")
print("Amplitude2:", amplitude2)
print("Mean2:", mean2)
print("Standard Deviation2:", stddev2)
Upvotes: 1
Reputation: 11002
Here is a MCVE showing how to regress a variable number of peaks from the significant part of your signal.
import itertools
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy import signal, stats, optimize
We load you data and post-process it to get physical series:
data = pd.read_csv("20240402_In.mca", encoding="latin-1", names=["y"]).loc[12:2058]
data["y"] = pd.to_numeric(data["y"])
data["y"] /= data["y"].max()
data["x"] = -0.0188026396003431 + 0.039549044037714 * data.index
data = data.loc[(20. <= data["x"]) & (data["x"] <= 30.), :]
We create the model which accept a variable number of peaks (driven by find_peaks
solution):
def peak(x, A, s, x0):
law = stats.norm(loc=x0, scale=s)
return A * law.pdf(x) / law.pdf(x0)
def model(x, *parameters):
n = len(parameters) // 3
y = np.zeros_like(x)
for i in range(n):
y += peak(x, *parameters[(i * 3):((i + 1) * 3)])
return y
We identify peaks:
indices, bases = signal.find_peaks(data.y, prominence=0.0125)
#(array([ 67, 118, 195, 210]),
# {'prominences': array([0.01987281, 0.99920509, 0.18918919, 0.03338633]),
# 'left_bases': array([ 1, 1, 179, 205]),
# 'right_bases': array([ 76, 160, 219, 219])})
This is the key point, from the identification, we create our educated guess:
x0s = data.x.values[indices]
As = bases["prominences"]
ss = (data.x.values[bases["right_bases"]] - data.x.values[bases["left_bases"]]) / 8.
p0 = list(itertools.chain(*zip(As, ss, x0s)))
#[0.01987281399046105,
# 0.37077228785356864,
# 22.682348638047493,
# 0.9992050874403816,
# 0.7860372502495658,
# 24.699349883970907,
# 0.1891891891891892,
# 0.19774522018856988,
# 27.744626274874886,
# 0.033386327503974564,
# 0.06921082706599968,
# 28.337861935440593]
Now we regress the model wrt your dataset:
popt, pcov = optimize.curve_fit(model, data.x, data.y, p0=p0)
#array([1.03735804e-02, 9.61270732e-01, 2.29030214e+01, 9.92651381e-01,
# 1.59694755e-01, 2.46561911e+01, 1.85332645e-01, 1.21882422e-01,
# 2.77807838e+01, 3.67805911e-02, 1.21890416e-01, 2.83583849e+01])
We can now estimate the model:
yhat = model(data.x, *popt)
Graphically, it leads to:
Which is quite satisfactory for the unconstrained fit but certainly does not fulfill the constraint you want to enforce.
Could you publish your data ready for coding as arrays in your post?
Upvotes: 3