user1669710
user1669710

Reputation: 234

Selecting an integer from each of the given n sets of integers such that the sum of their pairwise distances is minimized

I don't know if this a well known problem or is reducible to one, but here is what I am struggling with since last few days.

Let X_1, X_2, X_3 .. X_n be sets of integers.

In each set X_i, we have m (equal for all sets) elements x_i,j : j = 1...m.

My goal is to pick a single element from each set, such that the sum of pairwise differences between all the selected elements is least possible.

I started off with a slightly different problem of minimizing the sum of serial-differences (that is, sum of differences between elements selected from consecutive sets).The forward computation in DP is as follows:

Let F(i, j) be the cost to select element j from set i.

F(i, j) = min_k F(i - 1, k) + |x(i-1,k) - x(i, j)|

That is, we select k-th element in the previous set and then evaluate the cost of selecting j-th element in the current set. The cost of current step is equal to the absolute difference between the k-th element in set i-1 and j-th element in set i.

However, my real problem does not depend on the order of the sets X_1, ... X_n. In essence, the dynamic programming is giving me something, but as I said, it is a related but different problem of finding the "chain" of elements, one from each set, such that the sum of "serial" differences is minimized.

From a basic analysis, I see that a naive solution to this problem would be to generate all permutations of the n sets and then solve it using dynamic programming for each permutation. This is intractable, but when n is small enough - we can even take this dumb exhaustive approach.

I am unable to figure out if this problem can be solved using a polynomial algorithm at all, or if it can be reduced to one of the known NP-hard/complete problems in which case, I will go for an approximate algorithm by modeling it as a quadratic program.

Please advise me or point me to some reading material.

Thanks.

[Edit 1]

Adding an example here for the convenience of discussion:

X = [[11, 24, 38], [12, 21, 33], [13, 22], [15, 23]]

The solution is 24, 21, 22, 23 (possibly irrelevant, but my DP gives me 11, 12, 13, 15, I have constructed this example particularly to fail my DP.).

[Edit 2]

I guess this is not an NP-complete problem. The solution that we can expand from the DP is as follows [Not sure if correct, but seems like it]:

The solution contains at least one element from each set.

So, let us select any set (preferably smallest, if the sizes vary), and remove it from the list.

Lets call it X\Xp where Xp is the set removed.

Now for each element x in \Xp, construct a new set X' = X\Xp U {x}.

We solve the DP m-times and compute the objective function (sum of pairwise distances) and then we can pick the best of them.

The DP itself takes O(nm^2) and we run this m-times, so we have O(nm^3).

Upvotes: 2

Views: 1132

Answers (2)

mcdowella
mcdowella

Reputation: 19601

I don't think this is NP-complete, because I think we can find and recognize a solution with at most O(n^2m^2) guesses.

So as not to worry about ties, go from integers to reals and add a very small amount of jitter. I think of this as random, but I think you could chose deterministic jitter to achieve the same effect.

Think of the problem as that of choosing from a collection of coloured points laid out on the real line so as to pick one point of each colour and minimise the sum of absolute differences between them.

I consider only the case when n is even. The median of the set of solution points occurs between the two central points. For each point in the solution set, there is no point of the same colour between it and the two centre points (or we could improve the solution). For the same reason, each point is the closest point of that colour to the median of the n-1 points we get if we remove it from the solution set.

Using our (nm)^2 guesses we guess the identity of the two central points in the solution set. This leaves us n-2 points to find, and we can reduce the possibilities to two of each colour, the closest point to the two central points on each side of those two points.

Now consider the median formed by removing one point from the solution set. If the point we remove is on the right side of the two central points, the median is the left of those two points. If the point we remove is on the left side, the median is the right of those two points. In a solution set, the point we have just removed is closer to the new median than any other point of that colour, and that new median is the further of the two central points from it. So we can distinguish it from the other candidate of the same colour - it is whichever of the two is closest to the further of the two central points.

Therefore by making at most O(n^2*m^2) guesses we can find a possible solution set for each get, and from these solution sets we choose the one with smallest objective to get the global minimum. Each guess takes some work - maybe O(m) so this may well be an O(n^2m^3) with a completely naive implementation - perhaps mostly a theoretical approach.

Perhaps this is too good to be true, but can we turn this into a proof of simply checking each point in the data together with the point of each other colour closest to it? An argument would be that if we have two points and we can recognize each point in the solution as closest to the furthest of the two points then this point must also be closest to the other point in the pair. So guessing either point in the pair and finding the closest points to it might work. This begins to look a lot like a proof of Evgeny Kluev's solution

Upvotes: 1

xjchengo
xjchengo

Reputation: 9

This can reduce to find a minimum weight clique in a weighted graph. Any two nodes in the different sets have a weight equal to the absolute value of their difference.

Upvotes: 1

Related Questions