Synchronizing Data in Python (or Excel)

Question

I frequently use Python (and occasionally Excel) to process and compare test data between multiple experiments. In some cases the data might be out of sync which makes direct comparisons difficult. For example, a typical test specification would be:

1) Stabilize test temperature to a value of 20 +/- 2 degrees C
2) Hold test temperature at stabilized value for 15-25 seconds
3) Increase temperature by 20 degrees C at a rate of 0.5 degree C/second

It is straightforward to normalize the data so they both start with a nominal temperature value of 20 C at time = 0 seconds, but what I really want is to synchronize the data so that the temperature ramps begin at the same time.

I've tried simple algorithms to check the slope of data to identify when the temperature increase begins, but local fluctuations in the measurements due to instrumentation result in slopes that don't actually reflect the overall rate of change in temperature.

Are there functions in Numpy, Scipy, Pandas, etc. that can filter out these local fluctuations and identify when the temperature actually begins to increase.

I do occasionally work in Excel so if there is a more convenient way to do this in a spreadsheet I can use Excel to process the data.

Any suggestions would be appreciated.

Warren Weckesser · Accepted Answer

There are many ways to approach this. The first thing that comes to mind is to numerically differentiate the data, and look for the jump in the slope from 0 to 0.5. But (as you observed) noisy data can prevent this from working well. If you google for "numerical differentiation of noisy data", you'll find a lot of research on this topic, but I don't know of any off-the-shelf libraries in python. You might be able to make some progress using a Savitzky-Golay filter: scipy.signal.savgol_filter.

However, that approach is probably overkill, since your signal has a very simple and specific expected structure: a constant interval followed by a ramp and then another constant. You might find that scipy.optimize.curve_fit works fine for this. Here's an example:

from __future__ import division

import numpy as np
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt


def ramp(t, temp_init, temp_final, t0, t1):
    slope = (temp_final - temp_init) / (t1 - t0)
    y = temp_init + np.minimum(slope * np.maximum(t - t0, 0.0), temp_final - temp_init)
    return y

np.random.seed(123)

dt = 0.5
t = np.arange(0, 100, dt)

# Generate a sample to work with.
temp_init = 20.0 + np.random.randn()
temp_final = 40.0 + np.random.randn()
t0 = 25.0
t1 = t0 + 40
y = ramp(t, temp_init, temp_final, t0, t1)
y += 0.25*np.random.randn(*t.shape)  # Add noise.

# Create an initial guess for the four parameters and use curve_fit
# to fit the ramp function to the sample.
T = t[-1] - t[0]
p0 = (20, 40, t[0] + 0.333*T, t[0] + 0.667*T)
popt, pcov = curve_fit(ramp, t, y, p0=p0)
fit_temp_init, fit_temp_final, fit_t0, fit_t1 = popt


print "             Input    Fit"
print "temp_init   %6.2f  %6.2f" % (temp_init, fit_temp_init)
print "temp_final  %6.2f  %6.2f" % (temp_final, fit_temp_final)
print "t0          %6.2f  %6.2f" % (t0, fit_t0)
print "t1          %6.2f  %6.2f" % (t1, fit_t1)

plt.plot(t, y, 'ro', alpha=0.15)
plt.plot(t, ramp(t, popt[0], popt[1], popt[2], popt[3]), 'k-', linewidth=1.5)
plt.grid(True)
plt.xlabel('t', fontsize=12)
plt.show()

This generates the output:

             Input    Fit
temp_init    18.91   18.91
temp_final   41.00   40.99
t0           25.00   24.85
t1           65.00   65.09

and the plot:

fit ramp

Synchronizing Data in Python (or Excel)

Answers (1)

Related Questions