Reputation: 21140
I have path set A and path set B. I am trying to find an algorithm to compare both path sets for similarity.
Path characteristics:
Scale should be taken into account, i.e. a small X should match a large X. Translation does not need to be taken into account for any paths because the bottom most point of any path will have y of 0 and left most point of any path will have x of 0.
Is there a best practice or well known algorithm (I have found little in my Google searches) to compare these kinds of path sets for similarity?
Upvotes: 6
Views: 4668
Reputation: 2158
Disclaimer: I am a layman in image processing. All the content in this answer is based on my conjecture, and is not tested and supported by literature.
I think we can make use of the concept of vertices
of object. The objects concerned here are 1D lines, so vertices should be the end points of lines.
For example, for the image "X", assuming there are two lines, there should be four vertices, two per line.
Now for the image "X", it can in fact arrive from four lines, each joining at the center. Then the naive counting of vertices will give eight vertices, which is not quite what we want. One way to reduce this counting result to four, is to merge lines with neighborhood traversing. Imagine we are forming edges between points if they are within a vertical, horizontal and diagonal hop. Then we start with a random vertex and run DFS
on the graph, which will give a set of dead-ends as vertices. This will give four vertices instead of eight.
For two images to be the same in your question, at least they need to have the same number of vertices. The distances between the vertices should be small when they are optimally aligned, so we can possibly pair the vertices greedily to find the optimal alignment. Find the closest pair between images, then the next closest etc until all vertices are paired. Then the similarity between images can be something like root mean square of euclidean distance of pairs.
Or, if the number of vertices is small enough, just optimize over O(N^3) (I think it's sum of decreasing squares...) possible pairs. That should give a better result.
I won't try this, because I am lazy...My imagination flies like a pig. Cheers!
Upvotes: 0
Reputation: 129707
Algorithmically, I think I would try something like this:
For each path, convert the consecutive pairs of points comprising the path into a list of vectors, where a vector is defined as a pairing of a magnitude (length) and a direction (an angle relative to the X-axis). You can compute these values like this (C#):
double dx = endPoint.X - startPoint.X;
double dy = endPoint.Y - startPoint.Y;
double magnitude = Math.Sqrt((dx * dx) + (dy * dy));
double direction = Math.Atan2(dy, dx) * (180 / Math.PI);
Next, "normalize" each vector sequence by combining consecutive vectors that have the same* direction. In other words, replace those with a new vector that has the same direction and the sum of their magnitudes. This will take care of the cases where you have more than two points on the same line anywhere on your paths. After this step you should have the same number of vectors in each sequence. (If not, the paths are not similar.)
Figure out the scaling factor. Take the magnitude of the first vector in the first sequence and divide it by the magnitude of the first vector in the second sequence.
Now you can compare the sequences for similarity by iterating over both sequences in tandem. For each corresponding vector in each sequence, check that their directions are equal* and the ratio of their magnitudes are equal* to the scaling factor. If not, the paths are not similar.
*When checking whether two double values are "equal", you must keep in mind that not every real number can be accurately represented by a double, so you cannot directly compare two doubles and expect accurate results. Instead you should decide on an error tolerance appropriate for your situation and determine whether the difference between the values you are comparing is within that tolerance. See What is the most effective way for float and double comparison? for extensive treatment of the subject.
Upvotes: 4