Reputation: 875

Aligning two data sets in Python

I want to develop some python code to align datasets obtained by different instruments recording the same event.

As an example, say I have two sets of measurements:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Define some data
data1 = pd.DataFrame({'TIME':[1.1, 2.4, 3.2, 4.1, 5.3],\
                      'VALUE':[10.3, 10.5, 11.0, 10.9, 10.7],\
              'ERROR':[0.2, 0.1, 0.4, 0.3, 0.2]})

data2 = pd.DataFrame({'TIME':[0.9, 2.1, 2.9, 4.2],\
                      'VALUE':[18.4, 18.7, 18.9, 18.8],\
              'ERROR':[0.3, 0.2, 0.5, 0.4]})

# Plot the data      
plt.errorbar(data1.TIME, data1.VALUE, yerr=data1.ERROR, fmt='ro')
plt.errorbar(data2.TIME, data2.VALUE, yerr=data2.ERROR, fmt='bo')
plt.show()

The result is plotted here: enter image description here

What I would like to do now is to align the second dataset (data2) to the first one (data1). i.e. to get this: enter image description here

The second dataset must be shifted to match the first one by subtracting a constant (to be determined) from all its values. All I know is that the datasets are correlated since the two instruments are measuring the same event but with different sampling rates.

At this stage I do not want to make any assumptions about what function best describes the data (fitting will be done after alignment).

I am cautious about using means to perform shifts since it may produce bad results, depending on how the data is sampled. I was considering taking each data2[TIME_i] and working out the shortest distance to data1[~TIME_i]. Then minimizing the sum of those. But I am not sure that would work well either.

Does anyone have any suggestions on a good method to use? I looked at mlpy but it seems to only work on 1D arrays.

Thanks.

Upvotes: 1

Answers (2)

Exelian

Reputation: 5888

You can calculate the offset of the average and subtract that from every value. If you do this for every value they should align relatively well. This would assume both dataset look relatively similar, so it might not work the best.

Although this question is not Matlab related, you might still be interested in this: Remove unknown DC Offset from a non-periodic discrete time signal

Upvotes: 2

sebix

Reputation: 3239

You can substract the mean of the difference: data2.VALUE-(data2.VALUE - data1.VALUE).mean()

import pandas as pd
import matplotlib.pyplot as plt

# Define some data
data1 = pd.DataFrame({
    'TIME': [1.1, 2.4, 3.2, 4.1, 5.3],
    'VALUE': [10.3, 10.5, 11.0, 10.9, 10.7],
    'ERROR': [0.2, 0.1, 0.4, 0.3, 0.2],
})

data2 = pd.DataFrame({
    'TIME': [0.9, 2.1, 2.9, 4.2],
    'VALUE': [18.4, 18.7, 18.9, 18.8],
    'ERROR': [0.3, 0.2, 0.5, 0.4],
})

# Plot the data
plt.errorbar(data1.TIME, data1.VALUE, yerr=data1.ERROR, fmt='ro')
plt.errorbar(data2.TIME, data2.VALUE-(data2.VALUE - data1.VALUE).mean(),
             yerr=data2.ERROR, fmt='bo')
plt.show()

aligned error bars

Another possibility is to subtract the mean of each series

Upvotes: 4

Aligning two data sets in Python

Answers (2)

Related Questions