Reputation: 10378
The problem:
N nodes are related to each other by a 'closeness' factor ranging from 0 to 1, where a factor of 1 means that the two nodes have nothing in common and 0 means the two nodes are exactly alike.
If two nodes are both close to another node (i.e. they have a factor close to 0) then this doesn't mean that they will be close together, although probabilistically they do have a much higher chance of being close together.
-
The question:
If another node is placed in the set, find the node that it is closest to in the shortest possible amount of time.
This isn't a homework question, this is a real world problem that I need to solve - but I've never taken any algorithm courses etc so I don't have a clue what sort of algorithm I should be researching.
I can index all of the nodes before another one is added and gather closeness data between each node, but short of comparing all nodes to the new node I haven't been able to come up with an efficient solution. Any ideas or help would be much appreciated :)
Upvotes: 3
Views: 315
Reputation: 19601
ACM Surveys September 2001 carried two papers that might be relevant, at least for background. "Searching in Metric Spaces", lead author Chavez, and "Searching in High Dimensional Spaces - Index Structures for Improving the Performance of Multimedia Databases", lead author Bohm. From memory, if all you have is the triangle inequality, you can use it to some effect, but if you can trim your data down to a sensible number of dimensions, you can do better by using a search structure that knows about this dimensional structure.
Upvotes: 1
Reputation: 101139
Because your 'closeness' metric obeys the triangle inequality, you should be able to use a variant of BK-Trees to organize your elements. Adapting them to real numbers should simply be a matter of choosing an interval to quantize your number on, and otherwise using the standard Bk-Tree procedure. Some experimentation may be required - you might want to increase the resolution of the quantization as you progress down the tree, for instance.
Upvotes: 1
Reputation: 106068
If this closeness forms a linear spectrum (such that closeness to something implies closeness to other things that are close to it, and not being close implies not being close to those close), then you can simply do a binary or interpolation sort on insertion for closeness, handling one extra complexity: at each point you have to see if closeness increases or decreases below or above.
For example, if we consider letters - A is close to B but far from Z - then the pre-existing elements can be kept sorted, say: A, B, E, G, K, M, Q, Z. To insert say 'F', you start by comparing with the middle element, [3] G, and the one following that: [4] K. You establish that F is closer to G than K, so the best match is either at G or to the left, and we move halfway into the unexplored region to the left... 3/2=[1] B, followed by E, and we find E's closer to F, so the match is either at E or to its right. Halving the space between our earlier checks at [3] and [1], we test at [2] and find it equally-distant, so insert it in between.
EDIT: it may work better in probabilistic situations, and require less comparisons, to start at the ends of the spectrum and work your way in (e.g. compare F to A and Z, decide it's closer to A, see if A's closer or the halfway point [3] G). Also, it might be good to finish with a comparison to the closest few points either side of where the binary/interpolation led you.
Upvotes: 1
Reputation: 76888
It looks suspiciously like a Nearest Neighbor Search problem (also called a similarity search
)
Upvotes: 0
Reputation: 10184
If you want the optimal algorithm in terms of speed, but O(n^2) space, then for each node create a sorted list of other nodes (ordered by closeness).
When you get a new node, you have to add it to the indexed list of all the other nodes, and all the other nodes need to be added to its list.
To find the closest node, just find the first node on any node's list.
Since you already need O(n^2) space (in order to store all the closeness information you need basically an NxN matrix where A[i,j] represents the closeness between i and j) you might as well sort it and get O(1) retrieval.
Upvotes: 1
Reputation: 621
but short of comparing all nodes to the new node I haven't been able to come up with an efficient solution
Without any other information about the relationships between nodes, this is the only way you can do it since you have to figure out the closeness factor between the new node and each existing node. A O(n) algorithm can be a perfectly decent solution.
One addition you might consider - keep in mind we have no idea what data structure you are using for your objects - is to organize all present nodes into a graph, where nodes with factors below a certain threshold can be considered connected, so you can first check nodes that are more likely to be similar/related.
Upvotes: 1
Reputation:
Facebook has this thing where it puts you and all of your friends in a graph, then slowly moves everyone around until people are grouped together based on mutual friends and so on.
It looked to me like they just made anything <0.5 an attractive force, anything >0.5 a repulsive force, and moved people with every iteration based on the net force. After a couple hundred iterations, it was looking pretty darn good.
Note: this is not an algorithm it is a heuristic. In the facebook implementation I saw, two people were not able to reach equilibrium and kept dancing around each other. It turns out they were actually the same person with two different accounts.
Also, it took about 15 minutes on a decent computer and ~100 nodes. YMMV.
Upvotes: 0