Using numpy/scipy to identify slope changes in digital signals?

Question

I am trying to come up with a generalised way in Python to identify pitch rotations occurring during a set of planned spacecraft manoeuvres. You could think of it as a particular case of a shift detection problem.

Let's consider the solar_elevation_angle variable in my set of measurements, identifying the elevation angle of the sun measured from the spacecraft's instrument. For those who might want to play with the data, I saved the solar_elevation_angle.txt file here.

import numpy as np
import matplotlib.pyplot as plt
from matplotlib import gridspec
from scipy.signal import argrelmax
from scipy.ndimage.filters import gaussian_filter1d

solar_elevation_angle = np.loadtxt("solar_elevation_angle.txt", dtype=np.float32)

fig, ax = plt.subplots()    
ax.set_title('Solar elevation angle')
ax.set_xlabel('Scanline')
ax.set_ylabel('Solar elevation angle [deg]')
ax.plot(solar_elevation_angle)
plt.show()

The scanline is my time dimension. The four points where the slope changes identify the spacecraft pitch rotations.

As you can see, the solar elevation angle evolution outside the spacecraft manoeuvres regions is pretty much linear as a function of time, and that should always be the case for this particular spacecraft (except for major failures).

Note that during each spacecraft manoeuvre, the slope change is obviously continuous, although discretised in my set of angle values. That means: for each manoeuvre, it does not really make sense to try to locate a single scanline where a manoeuvre has taken place. My goal is rather to identify, for each manoeuvre, a "representative" scanline in the range of scanlines defining the interval of time where the manoeuvre occurred (e.g. middle value, or left boundary).

Once I get a set of "representative" scanline indexes where all manoeuvres have taken place, I could then use those indexes for rough estimations of manoeuvres durations, or to automatically place labels on the plot.

My solution so far has been to:

Compute the 2nd derivative of the solar elevation angle using np.gradient.
Compute absolute value and clipping of resulting curve. The clipping is necessary because of what I assume to be discretisation noise in the linear segments, which would then severely affect the identification of the "real" local maxima in point 4.
Apply smoothing to the resulting curve, to get rid of multiple peaks. I'm using scipy's 1d gaussian filter with a trial-and-error sigma value for that.
Identify local maxima.

Here's my code:

fig = plt.figure(figsize=(8,12))
gs = gridspec.GridSpec(5, 1) 

ax0 = plt.subplot(gs[0])
ax0.set_title('Solar elevation angle')
ax0.plot(solar_elevation_angle)

solar_elevation_angle_1stdev = np.gradient(solar_elevation_angle)
ax1 = plt.subplot(gs[1])
ax1.set_title('1st derivative')
ax1.plot(solar_elevation_angle_1stdev)

solar_elevation_angle_2nddev = np.gradient(solar_elevation_angle_1stdev)
ax2 = plt.subplot(gs[2])
ax2.set_title('2nd derivative')
ax2.plot(solar_elevation_angle_2nddev)

solar_elevation_angle_2nddev_clipped = np.clip(np.abs(np.gradient(solar_elevation_angle_2nddev)), 0.0001, 2)
ax3 = plt.subplot(gs[3])
ax3.set_title('absolute value + clipping')
ax3.plot(solar_elevation_angle_2nddev_clipped)

smoothed_signal = gaussian_filter1d(solar_elevation_angle_2nddev_clipped, 20)
ax4 = plt.subplot(gs[4])
ax4.set_title('Smoothing applied')
ax4.plot(smoothed_signal)

plt.tight_layout()
plt.show()

I can then easily identify the local maxima by using scipy's argrelmax function:

max_idx = argrelmax(smoothed_signal)[0]
print(max_idx)
# [ 689 1019 2356 2685]

Which correctly identifies the scanline indexes I was looking for:

fig, ax = plt.subplots()
ax.set_title('Solar elevation angle')
ax.set_xlabel('Scanline')
ax.set_ylabel('Solar elevation angle [deg]')
ax.plot(solar_elevation_angle)
ax.scatter(max_idx, solar_elevation_angle[max_idx], marker='x', color='red')
plt.show()

My question is: Is there a better way to approach this problem?
I find that having to manually specify the clipping threshold values to get rid of the noise and the sigma in the gaussian filter weakens this approach considerably, preventing it to be applied to other similar cases.

user6655984 · Accepted Answer

First improvement would be to use a Savitzky-Golay filter to find the derivative in a less noisy way. For example, it can fit a parabola (in the sense of least squares) to each data slice of certain size, and then take the second derivative of that parabola. The result is much nicer than just taking 2nd order difference with gradient. Here it is with window size 101:

savgol_filter(solar_elevation_angle, window_length=window, polyorder=2, deriv=2)

Second, instead of looking for points of maximum with argrelmax it is better to look for places where the second derivative is large; for example, at least half its maximal size. This will of course return many indexes, but we can then look at the gaps between those indexes to identify where each peak begins and ends. The midpoint of the peak is then easily found.

Here is the complete code. The only parameter is window size, which is set to 101. The approach is robust; the size 21 or 201 gives essentially the same outcome (it must be odd).

from scipy.signal import savgol_filter
window = 101
der2 = savgol_filter(solar_elevation_angle, window_length=window, polyorder=2, deriv=2)
max_der2 = np.max(np.abs(der2))
large = np.where(np.abs(der2) > max_der2/2)[0]
gaps = np.diff(large) > window
begins = np.insert(large[1:][gaps], 0, large[0])
ends = np.append(large[:-1][gaps], large[-1])
changes = ((begins+ends)/2).astype(np.int)
plt.plot(solar_elevation_angle)
plt.plot(changes, solar_elevation_angle[changes], 'ro')
plt.show()

The fuss with insert and append is because the first index with large derivative should qualify as "peak begins" and the last such index should qualify as "peak ends", even though they don't have a suitable gap next to them (the gap is infinite).

Piecewise linear fit

This is an alternative (not necessarily better) approach, which does not use derivatives: fit a smoothing spline of degree 1 (i.e., a piecewise linear curve), and notice where its knots are.

First, normalize the data (which I call y instead of solar_elevation_angle) to have standard deviation 1.

y /= np.std(y)

The first step is to build a piecewise linear curve that deviates from the data by at most the given threshold, arbitrarily set to 0.1 (no units here because y was normalized). This is done by calling UnivariateSpline repeatedly, starting with a large smoothing parameter and gradually reducing it until the curve fits. (Unfortunately, one can't simply pass in the desired uniform error bound).

from scipy.interpolate import UnivariateSpline
threshold = 0.1

m = y.size
x = np.arange(m)
s = m
max_error = 1
while max_error > threshold: 
  spl = UnivariateSpline(x, y, k=1, s=s)
  interp_y = spl(x)
  max_error = np.max(np.abs(interp_y - y))
  s /= 2
knots = spl.get_knots()
values = spl(knots)

So far we found the knots, and noted the values of the spline at those knots. But not all of these knots are really important. To test the importance of each knot, I remove it and interpolate without it. If the new interpolant is substantially different from the old (doubling the error), the knot is considered important and is added to the list of found slope changes.

ts = knots.size
idx = np.arange(ts)
changes = []
for j in range(1, ts-1):
  spl = UnivariateSpline(knots[idx != j], values[idx != j], k=1, s=0)
  if np.max(np.abs(spl(x) - interp_y)) > 2*threshold:
    changes.append(knots[j])
plt.plot(y)
plt.plot(changes, y[np.array(changes, dtype=int)], 'ro')
plt.show()

Ideally, one would fit piecewise linear functions to given data, increasing the number of knots until adding one more does not bring "substantial" improvement. The above is a crude approximation of that with SciPy tools, but far from best possible. I don't know of any off-the-shelf piecewise linear model selection tool in Python.

Using numpy/scipy to identify slope changes in digital signals?

Answers (1)

Piecewise linear fit

Related Questions