Steve Lorimer
Steve Lorimer

Reputation: 28679

Improve on O(m+n) algorithm for calculating correlation?

I have 2 data sets which contain sorted timestamps

I want to calculate how correlated they are.

We consider 2 timestamps match if they occur within a certain threshold of each other.

I have an O(M+N) algorithm which I believe works correctly, but I'm wondering if there's a better algorithm?

Essentially I walk the two arrays of timestamps, calculate the absolute time difference between each timestamp, and if its shorter than the threshold increment a counter.
I select the next timestamp from whichever array has the earliest of the two current timestamps, and repeat.
At the end, the correlation is the number of matches found divided by the data set size.

Here's pseudo code for what I have so far:

matches=0
i=0, j=0

while i < timestamps_1.size and j < timestamps_2.size
    diff = abs(timestamps_1[i] - timestamps_2[j])
    if diff < threshold
        matches += 1
    if timestamps_1[i] < timestamps_2[j]
        i += 1
    else
        j += 1
correlation = matches / timestamps_2.size            

Is there a better algorithm to achieve this?

Upvotes: 0

Views: 38

Answers (1)

mevets
mevets

Reputation: 10445

There is no way to solve this that doesn’t, in the worst case, involve visiting every member of each array. In the best case, you might be able to only visit each member of one of the arrays, and one member of the other. Order is based on the worst case; thus O(n+m)

Upvotes: 2

Related Questions