Gavin S
Gavin S

Reputation: 738

How to find the alignment of two data sets in pandas

Presented as an example.

Two data sets. One collected over a 1 hour period. One collected over a 20 min period within that hour.

Each data set contains instances of events that can transformed into single columns of true (-) or false (_), representing if the event is occurring or not.

DS1.event:

_-__-_--___----_-__--_-__---__

DS2.event:

__--_-__--

I'm looking for a way to automate the correlation (correct me if the terminology is incorrect) of the two data sets and find the offset(s) into DS1 at which DS2 is most (top x many) likely to have occurred. This will probably end up with some matching percentage that I can then threshold to determine the validity of the match.

Such that

_-__-_--___----_-__--_-__---__
                 __--_-__--

DS1.start + 34min ~= DS2.start

Additional information:
DS1 was recorded at roughly 1 Hz. DS2 at roughly 30 Hz. This makes it less likely that there will be a 100% clean match.

Alternate methods (to pandas) will be appreciated, but python/pandas are what I have at my disposal.

Upvotes: 3

Views: 1819

Answers (1)

Simon
Simon

Reputation: 10160

Sounds like you just want something like a cross correlation?

I would first convert the string to a numeric representation, so replace your - and _ with 1 and 0

You can do that using a strings replace method (e.g. signal.replace("-", "1"))

Convert them to a list or a numpy array:

event1 = [int(x) for x in signal1]
event2 = [int(x) for x in signal2]

Then calculate the cross correlation between them:

xcor = np.correlate(event1, event2, "full")

That will give you the cross correlation value at each time lag. You just want to find the largest value, and the time lag at which it happens:

nR = max(xcor)
maxLag = np.argmax(xcor)  # I imported numpy as np here

Giving you something like:

Cross correlation value: 5
Lag: 20

It sounds like you're more interested in the lag value here. What the lag tells you is essentially how many time/positional shifts are required to get the maximum cross correlation value (degree of match) between your 2 signals

You might want to take a look at the docs for np.correlate and np.convolve to determine the method (full, same, or valid) you want to use as thats determined by the length of your data and what you want to happen if your signals are different lengths

Upvotes: 5

Related Questions