Numpy: Numerical integration with integration limits

Question

I have measured peaks that I want to integrate in a certain range.

The data I want to integrate is in the form of numpy arrays with wavenumbers and intensities:

peakQ1_2500_smoothened =
array([[ 1.95594400e+04, -3.70074342e-17,  3.26000000e+00],
       [ 1.95594500e+04,  1.66666667e-03,  4.81500000e+00],
       [ 1.95594600e+04,  2.83333333e-02,  4.80833333e+00],
       [ 1.95594700e+04,  1.33333333e-02,  4.82166667e+00],
       [ 1.95594800e+04,  5.00000000e-03,  4.92416667e+00],
       [ 1.95594900e+04,  5.55555556e-04,  4.99305556e+00],
       [ 1.95595100e+04, -7.77777778e-03,  5.03972222e+00],
       [ 1.95595200e+04, -5.55555556e-03,  4.96888889e+00],
       [ 1.95595300e+04, -1.77777778e-02,  4.91333333e+00],
       [ 1.95595400e+04,  1.38888889e-02,  4.82500000e+00],
       [ 1.95595500e+04,  7.05555556e-02,  4.85722222e+00],
       [ 1.95595600e+04,  1.43888889e-01,  4.86638889e+00],
       [ 1.95595700e+04,  1.98888889e-01,  4.85138889e+00],
       [ 1.95595800e+04,  2.84444444e-01,  4.90694444e+00],
       [ 1.95595900e+04,  4.64444444e-01,  4.93611111e+00],
       [ 1.95596000e+04,  6.61111111e-01,  4.98166667e+00],
       [ 1.95596100e+04,  9.61666667e-01,  4.96722222e+00],
       [ 1.95596200e+04,  1.23222222e+00,  4.94388889e+00],
       [ 1.95596400e+04,  1.43555556e+00,  5.02166667e+00],
       [ 1.95596500e+04,  1.53222222e+00,  5.00500000e+00],
       [ 1.95596600e+04,  1.59833333e+00,  5.03666667e+00],
       [ 1.95596700e+04,  1.66388889e+00,  4.94555556e+00],
       [ 1.95596800e+04,  1.60111111e+00,  4.92777778e+00],
       [ 1.95596900e+04,  1.42333333e+00,  4.94666667e+00],
       [ 1.95597000e+04,  1.14111111e+00,  5.00777778e+00],
       [ 1.95597100e+04,  9.52222222e-01,  5.08555556e+00],
       [ 1.95597200e+04,  7.25555556e-01,  5.09222222e+00],
       [ 1.95597300e+04,  5.80555556e-01,  5.08055556e+00],
       [ 1.95597400e+04,  3.92777778e-01,  5.09611111e+00],
       [ 1.95597500e+04,  2.43222222e-01,  5.01655556e+00],
       [ 1.95597600e+04,  1.36555556e-01,  4.99822222e+00],
       [ 1.95597700e+04,  6.32222222e-02,  4.87044444e+00],
       [ 1.95597800e+04,  3.88888889e-02,  4.91944444e+00],
       [ 1.95597900e+04,  3.22222222e-02,  4.93611111e+00],
       [ 1.95598000e+04,  2.44444444e-02,  5.10277778e+00],
       [ 1.95598100e+04,  5.11111111e-02,  5.11277778e+00],
       [ 1.95598200e+04,  4.44444444e-02,  5.21944444e+00],
       [ 1.95598300e+04,  4.33333333e-02,  5.05333333e+00],
       [ 1.95598400e+04,  3.58333333e-02,  5.08750000e+00],
       [ 1.95598500e+04,  7.50000000e-03,  5.12750000e+00],
       [ 1.95598600e+04,  4.16666667e-03,  5.22916667e+00],
       [ 1.95598800e+04, -1.33333333e-02,  3.51000000e+00]])

I found that I can do an integration over the whole array with:

def integratePeak(yvals, xvals):
    I = np.trapz(yvals, x = xvals)
    return I

But how do I make an integration with x-limits, for example from 19559.52 to 19559.78?

def integratePeak(yvals, xvals, xlower, xupper):
    '''integrate y over x from xlower to xupper'''
    return I

I could of course give the x- and y-values by explicitly referring to array elements as peakQ1_2500_smoothened[7:33,0] and peakQ1_2500_smoothened[7:33,1] but obviously I do not want to refer to array elements but define the integration limits as wavenumbers because the different measured peaks have different array lengths.

Functions for reducing to one data point per wavenumber and then taking a running average:

def averagePerWavenumber(data):
    wavenum, intensity, power = data[:,0], data[:,1], data[:,2]
    wavenum_unique, intensity_mean = npi.group_by(wavenum).mean(intensity)
    wavenum_unique, power_mean = npi.group_by(wavenum).mean(power)
    output = np.zeros(shape=(len(wavenum_unique), 3))
    output[:,0] = wavenum_unique
    output[:,1] = intensity_mean
    output[:,2] = power_mean
    return output

def smoothening(data, bins):
    output = np.zeros(shape=(len(data[:,0]), 3))
    output[:,0] = data[:,0]
    output[:,1] = np.convolve(data[:,1], np.ones(bins), mode='same') / bins
    output[:,2] = np.convolve(data[:,2], np.ones(bins), mode='same') / bins
    return output

Mad Physicist · Accepted Answer

Let's start by looking at what np.trapz actually does. The area of the ith trapezoid is the average height times the width: 0.5 * (y[i + 1] + y[i]) * (x[i + 1] - x[i]). If you have a fixed dx instead of an x array, the last term is just a scalar. So let's rewrite your first function:

def integrate_peak0(y, x):
    """ x can be array of same size as y or a scalar """
    dx = x if x.size <= 1 else np.diff(x)
    return np.sum(0.5 * (y[1:] + y[:-1]) * dx)

The hardest part now is interpolating the limits of integration. Since x is sorted, you can use np.searchsorted to convert the limits into indices into data:

limits = np.array([xlower, xupper])
indices = np.searchsorted(x, limits)

If the limits always fall on exact values of x, you can use indices directly:

def integrate_peak1(y, x, xlower, xupper):
    indices = np.searchsorted(x, [xlower, xupper])
    s = slice(indices[0], indices[1] + 1)
    return np.trapz(y[s], x[s])

Since this will almost never be the case, you can try the next simplest approach: rounding to the nearest value. You can use fancy indexing to get a 2D array for each potential bound that you can apply np.argmin to:

candidates = x[np.stack((indices - 1, indices), axis=0)]
offset = np.abs(candidates - limits).argmin(axis=0) - 1
indices += offset

candidates is a 2x2 array with the columns representing the candiates for each bound, and the rows representing the lesser and greater candidate. offset will be the amount that you need to modify the index by to get the nearest neighbor. Here is a version of the integrator that selects the nearest bin based on the integration limits:

def integrate_peak2(y, x, xlower, xupper):
    limits = np.array([xlower, xupper])
    indices = np.searchsorted(x, limits)
    candidates = x[np.stack((indices - 1, indices), axis=0)]
    indices += np.abs(candidates - limits).argmin(axis=0) - 1

    s = slice(indices[0], indices[1] + 1)
    return np.trapz(y[s], x[s])

The final version is to interpolate the values of y based on x. This version can be implemented in one of two ways. You can either compute the target y-values and pass them to np.trapz with the appropriate x, or you can use the function defined in integrate_peak0 to do the operation yourself.

Given an element x[i] < xn <= x[i + 1], you can estimate yn = y[i] + (y[i + 1] - y[i]) * (x[n] - x[i]) / (x[i + 1] - x[i]). Here, x[i] and x[i + 1] are the values of candidates shown above. y[i] and y[i + 1] are the corresponding elements of y. xn is limits. So you can compute the interpolation in a couple of different ways.

One way is to adjust the inputs to trapz:

def integrate_peak3a(y, x, xlower, xupper):
    limits = np.array([xlower, xupper])
    indices = np.searchsorted(x, limits)
    indices = np.stack((indices - 1, indices), axis=0)
    xi = x[indices]
    yi = y[indices]
    yn = yi[0] + np.diff(yi, axis=0) * (limits - xi[0]) / np.diff(xi, axis=0)

    indices = indices[[1, 0], [0, 1]]
    s = slice(indices[0], indices[1] + 1)
    return np.trapz(np.r_[yn[0, 0], y[s], yn[0, 1]], np.r_[xlower, x[s], xupper])

Another way is to compute the sums of the edge fragments manually:

def integrate_peak3b(y, x, xlower, xupper):
    limits = np.array([xlower, xupper])
    indices = np.searchsorted(x, limits)
    indices = np.stack((indices - 1, indices), axis=0)
    xi = x[indices]
    yi = y[indices]
    yn = yi[0] + np.diff(yi, axis=0) * (limits - xi[0]) / np.diff(xi, axis=0)

    indices = indices[[1, 0], [0, 1]]
    s = slice(indices[0], indices[1] + 1)
    return np.trapz(y[s], x[s]) - 0.5 * np.diff((yn + y[indices]) * (x[indices] - limits))

And of course, you can run the inputs to np.trapz in integrate_peak3a through the "manual" calculation in integrate_peak0.

Checking that the limits of integration are within an acceptable range and in the right order is left as an exercise for the reader in all these cases.

Numpy: Numerical integration with integration limits

Answers (2)

Related Questions