Reputation: 3
Newbie here, but I've tried to do my due diligence before posting. Apologies for any unintentional faux pas.
I'm acquiring data from an oscilloscope in the form of a Voltage vs. Time series. The time bins are 0.8nano seconds wide. I run multiple 'data capture' cycles. A single capture will have a fixed number of samples, and between 5 to 15 gaussian peaks with the exact number of peaks being unknown. The gaussian peaks have a relatively constrained FWHM (between 2 and 3 nanoseconds), a varying peak height, and a random arrival time (i.e centroid position is not periodic).
I've been using Python to fit gaussians to this data and have had some success using the scipy.optimise library and also with the astropy library. Code using scipy.optimise is included below. I can fit multiple gaussians but a key step in my code is providing a "guess" for number of peaks, and for each peak an estimate of the centroid positions, peak height, and peak widths. Is there a way to generalise this code and not have to provide a 'guess'? If I relax the conditions in the 'guess' the fits lose quality. I know that the peaks will be gaussians with a well constrained width, but would like to generalise the code to fit peak centroids and peak heights in any given data capture.
import ctypes
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
#Get data from file
with open('test3.txt') as f:
w, h = [float(x) for x in next(f).split()]
print(w, h)
array = [[float(x) for x in line.split()] for line in f]
#Separate
x, z = zip(*array)
#Change sign since fitting routine seems to
#prefer positive numbers
y=[ -p for p in z]
def func(x, *params):
y = np.zeros_like(x)
for i in range(0, len(params), 3):
ctr = params[i]
amp = params[i+1]
wid = params[i+2]
y = y + amp * np.exp( -((x - ctr)/wid)**2)
return y
#Guess the peak positions, heights, and widths
guess = [16, 5, 2, 75, 5, 2, 105, 5, 2, 139, 5, 2, 225, 5, 2, 315, 5, 2, 330, 5, 2]
#Fit and print parameters to screen
popt, pcov = curve_fit(func, x, y, p0=guess)
print(popt)
fit = func(x, *popt)
#Plot stuff
plt.plot(x, y)
plt.plot(x, fit , 'r-')
plt.show()
Results look like this: Plot of Data and Fits
Data file is here: https://spaces.hightail.com/receive/5MY7Vc7r9R
This is similar to How can I fit multiple Gaussian curved to mass spectrometry data in Python? and fit multiple gaussians to the data in python, but these two rely on fitting periodic datasets or datasets with known peak positions, widths and heights. They were useful in getting me this far, but I'm stuck now. Any ideas, or suggestions I can follow up on?
Thanks, Adi
Upvotes: 0
Views: 519
Reputation: 498
As mentioned in the comments every iterate algorithm that estimates need to start from some hyper-params . In the problem you are describing you have the initial gaussians params and the number of gaussians. When estimating a gaussians distributions the EM algorithm is proven to converge. I suggest to use it with random intial gaussians params and grid search for the best solution for number of distributions. Start with 5 till 15 and calculate the distance for each solutions and take the minimum distance solution. (https://en.m.wikipedia.org/wiki/Expectation%E2%80%93maximization_algorithm)
Upvotes: 0
Reputation: 51
My idea is that we compare the value of the curve with its average value.
The multiplier
variable means how many times the value must be greater than the average in order for us to understand that this is one of the peaks. The first point for a peak exceeding this value is considered the starting point for approximating the average value of this peak.
I also replaced the lists with arrays for x and y.
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
#Get data from file
with open('test3.txt') as f:
w, h = [float(x) for x in next(f).split()]
array = [[float(x) for x in line.split()] for line in f]
#Separate
x, z = zip(*array)
x = np.array(x)
y = np.array([ -p for p in z])
#Change sign since fitting routine seems to
#prefer positive numbers
def func(x, *params):
y = np.zeros_like(x)
for i in range(0, len(params), 3):
ctr = params[i]
amp = params[i+1]
wid = params[i+2]
y = y + amp * np.exp( -((x - ctr)/wid)**2)
return y
#Guess the peak positions, heights, and widths
# guess = [16, 5, 2, 75, 5, 2, 105, 5, 2, 139, 5, 2, 225, 5, 2, 315, 5, 2, 330, 5, 2]
def getPeaks(x, y, multiplier):
x_peaks = []
isPeak = False
for i, j in zip(x, y):
if j > y.mean() * multiplier:
if not isPeak:
isPeak = True
x_peaks.append(i)
else:
isPeak = False
return x_peaks
multiplier = 3
x_peaks = getPeaks(x, y, multiplier)
guess = []
for i in range(len(x_peaks)):
guess.append(x_peaks[i])
guess.append(5)
guess.append(2)
#Fit and print parameters to screen
popt, pcov = curve_fit(func, x, y, p0=guess)
print(popt)
fit = func(x, *popt)
#Plot stuff
plt.plot(x, y)
plt.plot(x, fit , 'r--')
# plt.plot(popt[::3], np.ones_like(popt[::3]) * multiplier, 'ko')
plt.show()
Upvotes: 0