dhblah
dhblah

Reputation: 10151

Don't understand closest pair heuristic from "The Algorithm Design Manual "

There is almost exactly the same question. But I still don't understand, how is this heuristic working and in what sequence vertexes are passed through. Also there is a picture in a book: enter image description here

That shows comparison of nearest-neghbor heuristic and what I believe is a closest-pair heuristic. From the picture I may assume that on the top picture, 0 point was selected first, but on the bottom picture there was selected the leftmost or the rightmost one. Because there is nothing said about first point selection (also the closest-pair heuristic doesn't do any actions in that), I may assume that any algorithm results however good it is won't give you the bottom picture if it doesn't consider, what point to start with.

For now I just want to know, what steps closest-pair heuristic makes. A picture similar to the bottom one with numbers associated with each iteration along with explanation would be appreciated.

Here is the link to the book taken from that post.

Upvotes: 26

Views: 6970

Answers (6)

AColoredReptile
AColoredReptile

Reputation: 275

I had a hard time understanding this as well, but I think the important part is this line here:

A different idea might repeatedly connect the closest pair of endpoints whose connection will not create a problem, such as premature termination of the cycle.

So the idea becomes once we have connected nodes, we are only now considering the end points of that chain when deciding what else to connect to.

ClosestPair(P)
Let n be the number of points in set P.
For i = 1 to n − 1 do
d = ∞
For each pair of endpoints (s, t) from distinct vertex chains
if dist(s, t) ≤ d then sm = s, tm = t, and d = dist(s, t)
Connect (sm, tm) by an edge
Connect the two endpoints by an edge

So let d = infty, We have this setup of P to start: [-21, -5, -1, 0, 1, 3, 11] For each possible pair of distinct points (s,t) in P, of which there are 7(7-1)/2 = 21, we find the smallest to be (-1,0) and (0,1). The distance between these are equal, so connect them.

This is the important step, you now have this setup:

[-21, -5, (-1,0,1), 3, 11].

Your pairs to test now are only the end points, which for the single nodes is obvious, but for (-1,0,1) you can test both connections to -1 and 1. That's how it differs from nearest neighbor. And the smallest distance between 2 endpoints is ((-1,0,1), 3). So we connect them.

And so we now have this setup:

[-21, -5, (-1,0,1,3), 11]

I think this clears up the idea. This is distinct from nearest neighbor because we are always allowed to consider any possible endpoint connection in our set P, where as the nearest neighbor must start somewhere and iteratively grow the connections from where it starts. We are allowed to grow our chain from 0->-1, then from 0->1, then 1->3, then -1->-5. This algorithm doesn't necessarily say where you start your tour, it just tells you how the path should be connected.

For this part Row of vertices problem

Its hard to explain in text but consider connecting the $1- \epsilon $ paths, these are all the vertical connections. You then connect the $1+\epsilon$ paths, which gives you this snaking path with a top left end point and a bottom right endpoint, which you have no choice but to connect.

Upvotes: 0

Ahmed Kamal
Ahmed Kamal

Reputation: 11

I think in the top picture the start is zero so the algorithm used "local greedy" becomes inefficient, while in the second one, the starting point is in the corners so maybe the shortest tour is from one end to the other and then returns to our starting point. so the algorithm used is wrong and may give output that is not even close to the shortest tour of the set of points.

Upvotes: 1

mc01
mc01

Reputation: 3780

TL;DR - As I understand it, the "bad" approach connects increasingly distant unvisited points to each other directly, wasting more solder in multiple arcs and eventually connecting the last point back to the 1st point '0'.

The "better" approach moves outward from the center 1 point at a time in either direction, so still back & forth, but only laying solder between directly neighboring points along the 'number line'.

The illustrations & explanations really aren't clear in a single static image. He doesn't explain what the dotted line means (is it soldered? Is it just motion? Seems inconsistent). Agree this is frustrating to encounter so early in what's supposed to be the "clear" algorithms book ...

Both the "closest neighbor" AND "closest pair" examples seem to "hopscotch" back & forth over point 0, which is the reason given for the 1st approach being bad. He even says the 2nd approach should also alternate left-right, so how is it any better? Because you only connect directly neighboring points to each other, building up the complete line incrementally.

The "bad" example (as I understand it):

1. 0 connects to 1
2. 1 jumps over 0 & connects to -1
3. -1 jumps over 0 & connects to 3
4. 3 jumps over 0 & connects to -5
5. -5 jumps over 0 & connects to 11
6. 11 jumps over 0 & connects to -21
7. -21 connects back to 0, completing the cycle. 

These "hopscotch" back & forth over 0, but don't connect the points in a straight line. You wind up w/6 distinct 'lines' of soldered connections.

The "better" closest pair example:

1. 0 connects to 1.
2. 0 connects to -1
3. 1 connects to 3
4. -1 connects to -5
5. 3 connects to 11
6. -5 connects to -21
7. -21 connects to 11 
8. Unstated, the robot arm apparently travels back to 0 afterward?

These also go back & forth over 0, but incrementally extend the existing straight line by one point at a time at each step. Instead of 6 arcs of solder, you have a single line for most of it, and then the 2nd connection of the outermost endpoints.

Upvotes: 3

sumit kumar
sumit kumar

Reputation: 31

I was going through the same problem of understand this heuristic, my answer might help the others facing the same problem.

What does the robot need is a closed path which visits all the points and cycle backs to its original position. There is a statement mentioning premature termination which insists that the path should be full (visiting all points i.e.. we should not join some pair of vertices such that they create a smaller closed path).

Now going through the example below

Figure 1.3

here the by using the closest-pair heuristic we will find a path which connects all the points in a line and then connects the remaining end points with each other (-21 , 11). So whether robot starts with 0 or -21 or 11, it will be travelling the same distance (it will circle back to its starting position for next iteration). This distance will be the optimal distance.

But the above approach fails in the case below

Figure 1.4

here the closest pair path turns out to be the image on the left, while the optimal path should be the image on the right, hence the heuristic fails to give the correct solution.

Upvotes: 3

ricab
ricab

Reputation: 2812

I agree this is not very clear in the book (which is a bit of a downer as the reader encounters it right away - page 7 in my edition).

I think the difficulty here is not in the closest-pair heuristic itself. The key is noticing that the heuristic is not itself supposed to be the solution to the problem! Only a part (arguably the most important part) of an algorithm that is never fully described in the book (probably because this is intended as a wrong algorithm). With the heuristic, you get the pairs of vertexes that should be connected, but not the order in which they ought to be connected. For that, something more is needed.

For completeness, here is the problem statement from the book

Problem: Robot Tour Optimization

Input: A set S of n points in the plane

Output: What is the shortest cycle tour that visits each point in the set S

Now, the closest-pair heuristic, as defined in the book and quoted here, only gives you a set/list of connections, not the tour itself, and so not the required solution. To obtain the tour you would have to do something more. An overall (flawed!) solution using this strategy would look something like:

1) Initialize the output list of vertexes to visit as the empty list (call it RET).
2) Obtain the list of connections (vertex pairs) given by ClosestPair (let it be L)
3) If L is empty, jump to 12
4) Remove an arbitrary connection from L (call it C1).
5) Choose an arbitrary vertex from C1 (call it V1)
6) Append V1 to RET
7) Remove from L the other connection that contains V1 (call it C2)
8) Choose the other vertex from that connection (call it V2)
9) append V2 to RET
10) Set V1=V2
11) If L is not empty, jump back to 7
12) return RET

Or in pseudo-code

Alg(P): # P is the input set of points
    RET = []
    L = ClosestPairs(P)
    if(L.empty()):
        return RET
    C1 = L.getAndRemoveRandomElement()
    V1 = C1.getRandomVertex()
    RET.append(V1)
    while(!L.empty()):
        C2 = L.getAndRemoveElementContaining(V1)
        V2 = C2.getTheOtherVertex(V1)
        RET.append(V2)
        V1 = V2
    return RET

Upvotes: 15

Gordon Linoff
Gordon Linoff

Reputation: 1270421

I don't have the book, but it is showing a comparison of the nearest neighbor heuristic to the optimal solution for this data. The data shown here is (-21, -5, -1, 0, 1, 3, 11).

The confusion may be between a "local" greedy algorithm and a "global" greedy algorithm (for lack of better word). The nearest neighbor shown above is strictly local. The "robot" starts at 0 and chooses to go to 1, because it is the closest path. The robot is at 1, and finds the next closest point is -1. Then the robot is at -1 and the next closest point is 3, and so on.

The closest pair is more global. It is looking at all optimal edges at once. So, the algorithm starts at 0 and finds four that are exactly 1 unit apart (0, 1), (1, 0), (-1, 0), and (0, -1). It would add two distinct pairs creating the graph (-1, 0, 1). This could be either directed or non-directed.

Then it would repeat, and notice that (1, 3) is the next smallest edge, and so on, until it arrives at the optimal solution.

The difference is that in the nearest neighbor case, the robot can only look at the neighbors of where it is currently located. In the closest pair case, you can look at all edges to choose the smallest one.

Upvotes: 17

Related Questions