Vivek
Vivek

Reputation: 1336

Algorithm for shortest path from muptiple Sets

This is an interview Question (I saw it on a forum and am not able to figure out the best solution). The problem is from a given Set of numbers find the shortest path.

eg.

Set A - [2, 14, 34]
Set B - [9, 13]
Set C - [15, 22, 62, 78]
Set D - [16, 24, 54]
Set Z - [17, 38, 41]

1) There can be any number of sets

2) The numbers inside the set will never repeat.

3) The numbers can range from any start to any end (they are not between 0 - n, i.e. It can start from 1091 to 1890 etc)

4) All the sets are sorted.

in the above example the path will be:

B[13] -> A[14] -> C[15] -> D[16] -> Z[17]

The shortest path is defined as the difference between MAX number (17) - MIN Number (13) = 4;

Any ideas ?

Upvotes: 4

Views: 453

Answers (5)

Tian
Tian

Reputation: 11

a heap (priority queue) might help.

  1. merge sort all data into an array N, also keep original set id, assume there are m sets in total;
  2. int shortest = MAX(N) - MIN(N); // that is N[N.length - 1] - N[0]
  3. init a heap h;
  4. loop through N with i, if h does not contain element from the same set as N[i], add N[i] to heap; if h already contains an element from the same set, say h[k], increase the key of h[k] to N[i]. If h.size() == m, shortest == N[i] - h[0] < shortest ? N[i] - h[0]: shortest.

here is code:

mergesort(all_sets[], H, S); // H holds all data, S holds corresponding setid.
Heap<key, setid> H = new Heap<key, setid>();
int shortest = N[N.length - 1] - n[0];
for(int i = 0; i < N.length; i++)
{
   int data = N[i];
   int setID = S[i];
   int hindex = H.elementFromSet(setID);
   if(hindex < 0)
    { // H does not have any element from set with setID;
       H.add(data, setID);
    } else {
       H.increase(data, hindex);
    }
    if(H.size() == m)
    {
       shortest = shortest > N[i] - H[0]? N[i] - H[0] : shortest;
    }
}

Maybe I can use a hashtable to keep tracking set id to heap index.

the runtime I believe is O(nlgm).

Upvotes: 1

Sanjeev Satheesh
Sanjeev Satheesh

Reputation: 424

Here's an alternative formulation of the problem.

Q: Find the smallest interval which contains an element from all the sets.

A:

  1. Put all the elements in a single bucket and sort them. Complexity O(N*K)
  2. We will find the largest number such that there is at least one element from each set higher than this number by binary search. This will be MIN.
  3. Similarly, find the smallest number such that there is at least one element from each set smaller than this number by binary search. This will be MAX.

As an optimization, you can store each set as an interval, and use an interval tree for queries in steps 2 and 3. That way the query complexity changes from O(K) to O(log K)

Upvotes: 0

maniek
maniek

Reputation: 7307

make a list of pairs [number, name_of_set]. Sort it.

For a given length of path, D, scan the sorted list, keeping 2 pointers. Always increase the first pointer, and increase the second while the the spread is larger than D. While scanning, keep counts of elements between pointers belonging to each set. if there is element from each set, bingo, You found a path with difference at most D.

Now, binary search for D.

overall complexity O(N log N)

Upvotes: 3

Generic Human
Generic Human

Reputation: 6117

You can apply essentially the same idea as the algorithm I described in this question.

Let's look for the center of the final subset. It must minimize the maximum distance to each of the sets. As usual, the distance of a point to a set is defined as the minimum distance between the point and an element of the set.

For each set i, the function fi describing the distance to the set is piecewise linear. If a,b are two consecutive numbers, the relations fi(a) = 0, fi((a+b)/2) = (b-a)/2, fi(b) = 0 let us build a description of all the fi in linear time.

But we can also compute the maximum of two piecewise functions fi and fj in linear time, by considering the consecutive intervals [a,b] where they are linear: either the result is linear, or it is piecewise linear by adding the unique intersection point of the functions to the partition. Since the slopes of our functions are always +1 or -1, the intersection point is a half-integer so it can be represented exactly in floating-point (or fixed-point) arithmetic.

A convexity argument shows that the maximum g of all the fi can only have at most twice as many points as the fi, so we don't have to worry about the maximum having a number of points that would be exponential in the number of sets.

So we just:

  1. Compute the piecewise linear distance function fi for i = 1..p.
  2. Compute the maximum g of all the fi by repeatedly computing the maximum.
  3. The location of any minimum point of g is the desired center.
  4. For each set, pick the closest point to the center.
  5. The width of the set of points we picked is exactly the minimum of g :-)

Complexity is O(N) if the number of sets is bounded, or O(N p) if the number of sets p is variable. By being smart about how you compute the maximum (divide-and-conquer), I think you can even reduce it to O(N log p).

Upvotes: 0

Achilles
Achilles

Reputation: 1065

  1. Take Set A and Set B . Find shortest path in this set. This will be 14-13 . Now sort it , so that it becomes 13- 14. Now the short set short = {13,14}

  2. Take short set { 13, 14} and set C {15,22,62,78}. Now start node is 13 and end node is 14 in the short set. Beginning from the end node 14 the shortest reachable path is 15. So add the 15 to the short set. Now short set becomes { 13, 14 , 15}, sort it so that it remains {13, 14, 15}

  3. Now take short set {13,14,15} and set D { 16 , 24 , 54} The end node in short set is 15. So we begin from there. Now shortest path from 25 to set D is 16. So add 16 to the short set. Now short set becomes { 13,14,15,16}. Sort it . It remains {13,14,15,16}

3 . We can repeat this for the entire sets to get the resultant short set.

Upvotes: 0

Related Questions