Reputation: 59
I frequently use Python (and occasionally Excel) to process and compare test data between multiple experiments. In some cases the data might be out of sync which makes direct comparisons difficult. For example, a typical test specification would be:
1) Stabilize test temperature to a value of 20 +/- 2 degrees C
2) Hold test temperature at stabilized value for 15-25 seconds
3) Increase temperature by 20 degrees C at a rate of 0.5 degree C/second
It is straightforward to normalize the data so they both start with a nominal temperature value of 20 C at time = 0 seconds, but what I really want is to synchronize the data so that the temperature ramps begin at the same time.
I've tried simple algorithms to check the slope of data to identify when the temperature increase begins, but local fluctuations in the measurements due to instrumentation result in slopes that don't actually reflect the overall rate of change in temperature.
Are there functions in Numpy, Scipy, Pandas, etc. that can filter out these local fluctuations and identify when the temperature actually begins to increase.
I do occasionally work in Excel so if there is a more convenient way to do this in a spreadsheet I can use Excel to process the data.
Any suggestions would be appreciated.
Upvotes: 0
Views: 1803
Reputation: 114956
There are many ways to approach this. The first thing that comes to mind is to numerically differentiate the data, and look for the jump in the slope from 0 to 0.5. But (as you observed) noisy data can prevent this from working well. If you google for "numerical differentiation of noisy data", you'll find a lot of research on this topic, but I don't know of any off-the-shelf libraries in python. You might be able to make some progress using a Savitzky-Golay filter: scipy.signal.savgol_filter
.
However, that approach is probably overkill, since your signal has a very simple and specific expected structure: a constant interval followed by a ramp and then another constant. You might find that scipy.optimize.curve_fit
works fine for this. Here's an example:
from __future__ import division
import numpy as np
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt
def ramp(t, temp_init, temp_final, t0, t1):
slope = (temp_final - temp_init) / (t1 - t0)
y = temp_init + np.minimum(slope * np.maximum(t - t0, 0.0), temp_final - temp_init)
return y
np.random.seed(123)
dt = 0.5
t = np.arange(0, 100, dt)
# Generate a sample to work with.
temp_init = 20.0 + np.random.randn()
temp_final = 40.0 + np.random.randn()
t0 = 25.0
t1 = t0 + 40
y = ramp(t, temp_init, temp_final, t0, t1)
y += 0.25*np.random.randn(*t.shape) # Add noise.
# Create an initial guess for the four parameters and use curve_fit
# to fit the ramp function to the sample.
T = t[-1] - t[0]
p0 = (20, 40, t[0] + 0.333*T, t[0] + 0.667*T)
popt, pcov = curve_fit(ramp, t, y, p0=p0)
fit_temp_init, fit_temp_final, fit_t0, fit_t1 = popt
print " Input Fit"
print "temp_init %6.2f %6.2f" % (temp_init, fit_temp_init)
print "temp_final %6.2f %6.2f" % (temp_final, fit_temp_final)
print "t0 %6.2f %6.2f" % (t0, fit_t0)
print "t1 %6.2f %6.2f" % (t1, fit_t1)
plt.plot(t, y, 'ro', alpha=0.15)
plt.plot(t, ramp(t, popt[0], popt[1], popt[2], popt[3]), 'k-', linewidth=1.5)
plt.grid(True)
plt.xlabel('t', fontsize=12)
plt.show()
This generates the output:
Input Fit
temp_init 18.91 18.91
temp_final 41.00 40.99
t0 25.00 24.85
t1 65.00 65.09
and the plot:
Upvotes: 2