Can I fit a spectrum of multiple gaussians whos centroids and peak heights are randomly distributed?

Question

Newbie here, but I've tried to do my due diligence before posting. Apologies for any unintentional faux pas.

I'm acquiring data from an oscilloscope in the form of a Voltage vs. Time series. The time bins are 0.8nano seconds wide. I run multiple 'data capture' cycles. A single capture will have a fixed number of samples, and between 5 to 15 gaussian peaks with the exact number of peaks being unknown. The gaussian peaks have a relatively constrained FWHM (between 2 and 3 nanoseconds), a varying peak height, and a random arrival time (i.e centroid position is not periodic).

I've been using Python to fit gaussians to this data and have had some success using the scipy.optimise library and also with the astropy library. Code using scipy.optimise is included below. I can fit multiple gaussians but a key step in my code is providing a "guess" for number of peaks, and for each peak an estimate of the centroid positions, peak height, and peak widths. Is there a way to generalise this code and not have to provide a 'guess'? If I relax the conditions in the 'guess' the fits lose quality. I know that the peaks will be gaussians with a well constrained width, but would like to generalise the code to fit peak centroids and peak heights in any given data capture.

import ctypes
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit

#Get data from file
with open('test3.txt') as f:
    w, h = [float(x) for x in next(f).split()]
    print(w, h)
    array = [[float(x) for x in line.split()] for line in f]

#Separate    
x, z = zip(*array)
#Change sign since fitting routine seems to
#prefer positive numbers
y=[ -p for p in z]

def func(x, *params):
    y = np.zeros_like(x)
    for i in range(0, len(params), 3):
        ctr = params[i]
        amp = params[i+1]
        wid = params[i+2]
        y = y + amp * np.exp( -((x - ctr)/wid)**2)
    return y

#Guess the peak positions, heights, and widths
guess = [16, 5, 2, 75, 5, 2, 105, 5, 2, 139, 5, 2, 225, 5, 2, 315, 5, 2, 330, 5, 2]

#Fit and print parameters to screen
popt, pcov = curve_fit(func, x, y, p0=guess)
print(popt)
fit = func(x, *popt)

#Plot stuff
plt.plot(x, y)
plt.plot(x, fit , 'r-')
plt.show()

Results look like this: Plot of Data and Fits

Data file is here: https://spaces.hightail.com/receive/5MY7Vc7r9R

This is similar to How can I fit multiple Gaussian curved to mass spectrometry data in Python? and fit multiple gaussians to the data in python, but these two rely on fitting periodic datasets or datasets with known peak positions, widths and heights. They were useful in getting me this far, but I'm stuck now. Any ideas, or suggestions I can follow up on?

Thanks, Adi

Mark Petrov · Accepted Answer

My idea is that we compare the value of the curve with its average value.
The multiplier variable means how many times the value must be greater than the average in order for us to understand that this is one of the peaks. The first point for a peak exceeding this value is considered the starting point for approximating the average value of this peak.
I also replaced the lists with arrays for x and y.

import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit

#Get data from file
with open('test3.txt') as f:
    w, h = [float(x) for x in next(f).split()]
    array = [[float(x) for x in line.split()] for line in f]

#Separate    
x, z = zip(*array)
x = np.array(x)
y = np.array([ -p for p in z])
#Change sign since fitting routine seems to
#prefer positive numbers


def func(x, *params):
    y = np.zeros_like(x)
    for i in range(0, len(params), 3):
        ctr = params[i]
        amp = params[i+1]
        wid = params[i+2]
        y = y + amp * np.exp( -((x - ctr)/wid)**2)
    return y

#Guess the peak positions, heights, and widths
# guess = [16, 5, 2, 75, 5, 2, 105, 5, 2, 139, 5, 2, 225, 5, 2, 315, 5, 2, 330, 5, 2]

def getPeaks(x, y, multiplier):
    x_peaks = []
    isPeak = False
    for i, j in zip(x, y):
        if j > y.mean() * multiplier:
            if not isPeak:
                isPeak = True
                x_peaks.append(i)
        else:
            isPeak = False
    return x_peaks

multiplier = 3
x_peaks = getPeaks(x, y, multiplier)

guess = []
for i in range(len(x_peaks)):
    guess.append(x_peaks[i])
    guess.append(5)
    guess.append(2)
    

#Fit and print parameters to screen
popt, pcov = curve_fit(func, x, y, p0=guess)
print(popt)
fit = func(x, *popt)

#Plot stuff
plt.plot(x, y)
plt.plot(x, fit , 'r--')
# plt.plot(popt[::3], np.ones_like(popt[::3]) * multiplier, 'ko')
plt.show()

Can I fit a spectrum of multiple gaussians whos centroids and peak heights are randomly distributed?

Answers (2)

Related Questions