Reputation: 28679
I have 2 data sets which contain sorted timestamps
I want to calculate how correlated they are.
We consider 2 timestamps match if they occur within a certain threshold of each other.
I have an O(M+N) algorithm which I believe works correctly, but I'm wondering if there's a better algorithm?
Essentially I walk the two arrays of timestamps, calculate the absolute time difference between each timestamp, and if its shorter than the threshold increment a counter.
I select the next timestamp from whichever array has the earliest of the two current timestamps, and repeat.
At the end, the correlation is the number of matches found divided by the data set size.
Here's pseudo code for what I have so far:
matches=0
i=0, j=0
while i < timestamps_1.size and j < timestamps_2.size
diff = abs(timestamps_1[i] - timestamps_2[j])
if diff < threshold
matches += 1
if timestamps_1[i] < timestamps_2[j]
i += 1
else
j += 1
correlation = matches / timestamps_2.size
Is there a better algorithm to achieve this?
Upvotes: 0
Views: 38
Reputation: 10445
There is no way to solve this that doesn’t, in the worst case, involve visiting every member of each array. In the best case, you might be able to only visit each member of one of the arrays, and one member of the other. Order is based on the worst case; thus O(n+m)
Upvotes: 2