Reputation: 541

Efficiently checking which of a large collection of nodes are close together?

I'm currently interested in generating random geometric graphs. For my particular problem, we randomly place node v in the unit square, and add an edge from v to node u if they have Euclidean distance <= D, where D=D(u,n) varies with u and the number of nodes n in the graph.

Important points:

It is costly to compute D, so I'd like to minimize the number of calls to this function.
The vast majority of the time, when v is added, edges uv will be added to only a small number of nodes u (usually 0 or 1).

Question: What is an efficient method for checking which vertices u are "close enough" to v?

The brute force algorithm is to compute and compare dist(v,u) and D(u,n) for all extant nodes u. This requires O(n²) calls to D.

I feel we should be able to do much better than this. Perhaps some kind of binning would work. We could divide the space up into bins, then for each vertex u, we store a list of bins where a newly placed vertex v could result in the edge uv. If v ends up placed outside of u's list of bins (which should happen most of the time), then it's too far away, and we don't need to compute D. This is somewhat of a off-the-top-of-my-head suggestion, and I don't know if it'd work well (e.g., there would be overhead in computing sufficiently close bins, which might be too costly), so I'm after feedback.

Upvotes: 1

Answers (2)

Vincent van der Weele

Reputation: 13177

I would probably just use a binning approach.

Say we cut the unit square in m x m subsquares (each having side length 1/m of course). Since you place your vertices uniformly at random (or so I assumed), every square will contain n / m^2 vertices on average.

Depending on A1, A2, m and n, you can probably determine the maximum radius you need to check. Say that's less than m. Then, after inserting v, you would need to check the square in which it landed, plus all adjacent squares. Anyway, this is a constant number of squares, so for every insertion you'll need to check O(n / m^2) other vertices on average.

I don't know the best value for m (as said, that depends on A1 and A2), but say it would be sqrt(n), then your entire algorithm could run in O(n) expected time.

EDIT A small addition: you could keep track of vertices with many neighbors (so with high radius, which extends over multiple squares) and check them for every inserted vertex. There should only be few, so that's no problem.

Upvotes: 2

Ben S

Reputation: 69342

Based on your description of the problem, I would choose an R-tree as your data structure.

It allows for very fast searching by narrowing the set of vertices you need to run D against drastically. However, in the worst-case insertion, O(n) time is required. Thankfully, you're quite unlikely to hit the worst-case insertion with a typical data set.

Upvotes: 2

Efficiently checking which of a large collection of nodes are close together?

Answers (2)

Related Questions