Reputation: 133
I have measured peaks that I want to integrate in a certain range.
The data I want to integrate is in the form of numpy arrays with wavenumbers and intensities:
peakQ1_2500_smoothened =
array([[ 1.95594400e+04, -3.70074342e-17, 3.26000000e+00],
[ 1.95594500e+04, 1.66666667e-03, 4.81500000e+00],
[ 1.95594600e+04, 2.83333333e-02, 4.80833333e+00],
[ 1.95594700e+04, 1.33333333e-02, 4.82166667e+00],
[ 1.95594800e+04, 5.00000000e-03, 4.92416667e+00],
[ 1.95594900e+04, 5.55555556e-04, 4.99305556e+00],
[ 1.95595100e+04, -7.77777778e-03, 5.03972222e+00],
[ 1.95595200e+04, -5.55555556e-03, 4.96888889e+00],
[ 1.95595300e+04, -1.77777778e-02, 4.91333333e+00],
[ 1.95595400e+04, 1.38888889e-02, 4.82500000e+00],
[ 1.95595500e+04, 7.05555556e-02, 4.85722222e+00],
[ 1.95595600e+04, 1.43888889e-01, 4.86638889e+00],
[ 1.95595700e+04, 1.98888889e-01, 4.85138889e+00],
[ 1.95595800e+04, 2.84444444e-01, 4.90694444e+00],
[ 1.95595900e+04, 4.64444444e-01, 4.93611111e+00],
[ 1.95596000e+04, 6.61111111e-01, 4.98166667e+00],
[ 1.95596100e+04, 9.61666667e-01, 4.96722222e+00],
[ 1.95596200e+04, 1.23222222e+00, 4.94388889e+00],
[ 1.95596400e+04, 1.43555556e+00, 5.02166667e+00],
[ 1.95596500e+04, 1.53222222e+00, 5.00500000e+00],
[ 1.95596600e+04, 1.59833333e+00, 5.03666667e+00],
[ 1.95596700e+04, 1.66388889e+00, 4.94555556e+00],
[ 1.95596800e+04, 1.60111111e+00, 4.92777778e+00],
[ 1.95596900e+04, 1.42333333e+00, 4.94666667e+00],
[ 1.95597000e+04, 1.14111111e+00, 5.00777778e+00],
[ 1.95597100e+04, 9.52222222e-01, 5.08555556e+00],
[ 1.95597200e+04, 7.25555556e-01, 5.09222222e+00],
[ 1.95597300e+04, 5.80555556e-01, 5.08055556e+00],
[ 1.95597400e+04, 3.92777778e-01, 5.09611111e+00],
[ 1.95597500e+04, 2.43222222e-01, 5.01655556e+00],
[ 1.95597600e+04, 1.36555556e-01, 4.99822222e+00],
[ 1.95597700e+04, 6.32222222e-02, 4.87044444e+00],
[ 1.95597800e+04, 3.88888889e-02, 4.91944444e+00],
[ 1.95597900e+04, 3.22222222e-02, 4.93611111e+00],
[ 1.95598000e+04, 2.44444444e-02, 5.10277778e+00],
[ 1.95598100e+04, 5.11111111e-02, 5.11277778e+00],
[ 1.95598200e+04, 4.44444444e-02, 5.21944444e+00],
[ 1.95598300e+04, 4.33333333e-02, 5.05333333e+00],
[ 1.95598400e+04, 3.58333333e-02, 5.08750000e+00],
[ 1.95598500e+04, 7.50000000e-03, 5.12750000e+00],
[ 1.95598600e+04, 4.16666667e-03, 5.22916667e+00],
[ 1.95598800e+04, -1.33333333e-02, 3.51000000e+00]])
I found that I can do an integration over the whole array with:
def integratePeak(yvals, xvals):
I = np.trapz(yvals, x = xvals)
return I
But how do I make an integration with x-limits, for example from 19559.52 to 19559.78?
def integratePeak(yvals, xvals, xlower, xupper):
'''integrate y over x from xlower to xupper'''
return I
I could of course give the x- and y-values by explicitly referring to array elements as peakQ1_2500_smoothened[7:33,0]
and peakQ1_2500_smoothened[7:33,1]
but obviously I do not want to refer to array elements but define the integration limits as wavenumbers because the different measured peaks have different array lengths.
Functions for reducing to one data point per wavenumber and then taking a running average:
def averagePerWavenumber(data):
wavenum, intensity, power = data[:,0], data[:,1], data[:,2]
wavenum_unique, intensity_mean = npi.group_by(wavenum).mean(intensity)
wavenum_unique, power_mean = npi.group_by(wavenum).mean(power)
output = np.zeros(shape=(len(wavenum_unique), 3))
output[:,0] = wavenum_unique
output[:,1] = intensity_mean
output[:,2] = power_mean
return output
def smoothening(data, bins):
output = np.zeros(shape=(len(data[:,0]), 3))
output[:,0] = data[:,0]
output[:,1] = np.convolve(data[:,1], np.ones(bins), mode='same') / bins
output[:,2] = np.convolve(data[:,2], np.ones(bins), mode='same') / bins
return output
Upvotes: 3
Views: 4280
Reputation: 114488
Let's start by looking at what np.trapz
actually does. The area of the i
th trapezoid is the average height times the width: 0.5 * (y[i + 1] + y[i]) * (x[i + 1] - x[i])
. If you have a fixed dx
instead of an x
array, the last term is just a scalar. So let's rewrite your first function:
def integrate_peak0(y, x):
""" x can be array of same size as y or a scalar """
dx = x if x.size <= 1 else np.diff(x)
return np.sum(0.5 * (y[1:] + y[:-1]) * dx)
The hardest part now is interpolating the limits of integration. Since x
is sorted, you can use np.searchsorted
to convert the limits into indices into data:
limits = np.array([xlower, xupper])
indices = np.searchsorted(x, limits)
If the limits always fall on exact values of x
, you can use indices
directly:
def integrate_peak1(y, x, xlower, xupper):
indices = np.searchsorted(x, [xlower, xupper])
s = slice(indices[0], indices[1] + 1)
return np.trapz(y[s], x[s])
Since this will almost never be the case, you can try the next simplest approach: rounding to the nearest value. You can use fancy indexing to get a 2D array for each potential bound that you can apply np.argmin
to:
candidates = x[np.stack((indices - 1, indices), axis=0)]
offset = np.abs(candidates - limits).argmin(axis=0) - 1
indices += offset
candidates
is a 2x2 array with the columns representing the candiates for each bound, and the rows representing the lesser and greater candidate. offset
will be the amount that you need to modify the index by to get the nearest neighbor. Here is a version of the integrator that selects the nearest bin based on the integration limits:
def integrate_peak2(y, x, xlower, xupper):
limits = np.array([xlower, xupper])
indices = np.searchsorted(x, limits)
candidates = x[np.stack((indices - 1, indices), axis=0)]
indices += np.abs(candidates - limits).argmin(axis=0) - 1
s = slice(indices[0], indices[1] + 1)
return np.trapz(y[s], x[s])
The final version is to interpolate the values of y
based on x
. This version can be implemented in one of two ways. You can either compute the target y-values and pass them to np.trapz
with the appropriate x
, or you can use the function defined in integrate_peak0
to do the operation yourself.
Given an element x[i] < xn <= x[i + 1]
, you can estimate yn = y[i] + (y[i + 1] - y[i]) * (x[n] - x[i]) / (x[i + 1] - x[i])
. Here, x[i]
and x[i + 1]
are the values of candidates
shown above. y[i]
and y[i + 1]
are the corresponding elements of y
. xn
is limits
. So you can compute the interpolation in a couple of different ways.
One way is to adjust the inputs to trapz
:
def integrate_peak3a(y, x, xlower, xupper):
limits = np.array([xlower, xupper])
indices = np.searchsorted(x, limits)
indices = np.stack((indices - 1, indices), axis=0)
xi = x[indices]
yi = y[indices]
yn = yi[0] + np.diff(yi, axis=0) * (limits - xi[0]) / np.diff(xi, axis=0)
indices = indices[[1, 0], [0, 1]]
s = slice(indices[0], indices[1] + 1)
return np.trapz(np.r_[yn[0, 0], y[s], yn[0, 1]], np.r_[xlower, x[s], xupper])
Another way is to compute the sums of the edge fragments manually:
def integrate_peak3b(y, x, xlower, xupper):
limits = np.array([xlower, xupper])
indices = np.searchsorted(x, limits)
indices = np.stack((indices - 1, indices), axis=0)
xi = x[indices]
yi = y[indices]
yn = yi[0] + np.diff(yi, axis=0) * (limits - xi[0]) / np.diff(xi, axis=0)
indices = indices[[1, 0], [0, 1]]
s = slice(indices[0], indices[1] + 1)
return np.trapz(y[s], x[s]) - 0.5 * np.diff((yn + y[indices]) * (x[indices] - limits))
And of course, you can run the inputs to np.trapz
in integrate_peak3a
through the "manual" calculation in integrate_peak0
.
Checking that the limits of integration are within an acceptable range and in the right order is left as an exercise for the reader in all these cases.
Upvotes: 3
Reputation: 11181
def integratePeak(yvals, xvals, xlower, xupper):
'''integrate y over x from xlower to xupper.
Use trapz to integrate over points closest to xlower, xupper.
the +1 to idx_max is for numpy half-open indexing.
'''
idx_min = np.argmin(np.abs(xvals - xlower))
idx_max = np.argmin(np.abs(xvals - xupper)) + 1
result = np.trapz(yvals[idx_min:idx_max], x=xvals[idx_min:idx_max])
return result
As an aside, you may benefit from using pandas for tabular data - it interoperates well with numpy arrays and most importantly lets you label your data:
import pandas as pd
df = pd.DataFrame(peakQ1_2500_smoothened, columns=["wave_num", "intensity", "col3"])
integratePeak(yvals=df.intensity, xvals=df.wave_num, xlower=19559.52, xupper=19559.78)
# 0.18853555549577536
Upvotes: 1