10GeV
10GeV

Reputation: 475

Efficiently determining the similarity between two graphs

I have two "set" of graphs (that is, "xy" graphs on the Cartesian plane). All are similar in shape— effectively noise, except for a "peak" at some unknown point along the horizontal axis in each. For example:

enter image description here

In practice, these graphs consist of two C-style arrays (of type float), storing the points along the x and y axis, respectively. My goal is to pair the graphs in each "set" such that the peaks in each pair align as closely as possible with respect to the horizontal axis (that is, I want to iterate over one "set" and find the "best match" from the second "set"). I'm not concerned about the "breadth" of the peak, only that they are "well aligned".

In practice, I'm working with hundreds of thousands of such graphs and need to "pair them up" efficiently, time-wise. I've tried a few things, such as sorting the graphs based on the mean of their x or y coordinates, but didn't end up with "nicely-aligned" pairs (where I know that better pairings are possible).

For example, this pairing is not very "well aligned" (the pair consists of one graph, in yellow, and the pairing, in blue):

enter image description here

We can assume that the peaks are nearly always of roughly the same "height". The horizontal axis of the graphs are roughly the same range, but are of the precisely same "length".

EDIT:

It may help to clarify that I know that the peaks of the graphs will fall at approximately one of a handful of points along the x-axis (e.g. they may center roughly on x=5, x=10, or x=12). I don't, unfortunately, know where before-hand, nor precisely how many "points" there are (although on the order of 3 or 4). The goal is to group the graphs according to which point they (roughly) center on.

Upvotes: 0

Views: 382

Answers (1)

Ian Mercer
Ian Mercer

Reputation: 39277

Cross-correlation may be sufficient for what you describe. You can either slide one across the other summing the product of the values at the matching points to find the best alignment, or if as you describe you have a few known points at which the peak will occur, creating those known graph curves and then sliding them some delta amount left or right to see which of the peaks they align with.

As for efficiency, yes this sounds slow: thousands of sums of products, but modern CPUs are incredibly quick at processing such 'dumb' algorithms and may be able to vectorize the code to execute quickly.

Upvotes: 2

Related Questions