David Epstein
David Epstein

Reputation: 453

Choose rectangles with maximal intersection area

In this problem r is a fixed positive integer. You are given N rectangles, all the same size, in the plane. The sides are either vertical or horizontal. We assume the area of the intersection of all N rectangles has non-zero area. The problem is how to find N-r of these rectangles, so as to maximize the area of the intersection. This problem arises in practical microscopy when one repeatedly images a given biological specimen, and alignment changes slightly during this process, due to physical reasons (e.g. differential expansion of parts of the microscope and camera). I have expressed the problem for dimension d=2. There is a similar problem for each d>0. For d=1, an O(N log(N)) solution is obtained by sorting the lefthand endpoints of the intervals. But let's stick with d=2. If r=1, one can again solve the problem in time O(N log(N)) by sorting coordinates of the corners.

So, is the original problem solved by solving first the case (N,1) obtaining N-1 rectangles, then solving the case (N-1,1), getting N-2 rectangles, and so on, until we reduce to N-r rectangles? I would be interested to see an explicit counter-example to this optimistic attempted procedure. It would be even more interesting if the procedure works (proof please!), but that seems over-optimistic.

If r is fixed at some value r>1, and N is large, is this problem in one of the NP classes?

Thanks for any thoughts about this.

David

Upvotes: 10

Views: 1783

Answers (8)

Ed Staub
Ed Staub

Reputation: 15700

I believe this produces a perfect solution. David's solution is easier to implement, and should be faster in most cases.

This relies on the assumption that for any solution, at least one of the rejects must be a member of the complex hull. Applying this recursively leads to:

Compute a convex hull. Gather the set of all candidate solutions produced by:

{Remove a hull member, repair the hull} r times

(The hull doesn't really need to be repaired the last time.)

If h is the number of initial hull members, then the complexity is less than h^r, plus the cost of computing the initial hull. I am assuming that a hull algorithm is chosen such that the sorted data can be kept and reused in the hull repairs.

Upvotes: 0

David Epstein
David Epstein

Reputation: 453

I'm still trying to get used to this site. Somehow an earlier answer by me was truncated to two sentences. Thanks to everyone for their contributions, particularly to mhum whose counter-example to the greedy algorithm is satisfying. I now have an answer to my own question. I believe it is as good as possible, but lower bounds on complexity are too difficult for me. My solution is similar to Ed Staub's above and gives the same complexity estimates, but works for any value of r>0.

One of my rectangles is determined by its lower left corner. Let S be the set of lower left corners. In time O(N log(N)) we sort S into Sx according to the sizes of the x-coordinates. We don't care about the order within Sx between two lower left corners with the same x-coord. Similarly the sorted sequence Sy is defined by using the sizes of the y-coords. Now let u1, u2, u3 and u4 be non-negative integers with u1+u2+u3+u4=r. We compute what happens to the area when we remove various rectangles that we now name explicitly. We first remove the u1-sized head of Sx and the u2-sized tail of Sx. Let Syx be the result of removing these u1+u2 entries from Sy. We remove the u3-sized head of Syx and the u4-sized tail of Syx. One can now prove that one of these possible choices of (u1,u2,u3,u4) gives the desired maximal area of intersection. (Email me if you want a pdf of the proof details.) The number of such choices is equal to the number of integer points in the regular tetrahedron in 4-d euclidean space with vertices at the 4 points whose coordinate sum is r and for which 3 of the 4 coordinates are equal to 0. This is bounded by the volume of the tetrahedron, giving a complexity estimate of O(r^3).

So my algorithm has time complexity O(N log(N)) + O(r^3).

Upvotes: 0

David Epstein
David Epstein

Reputation: 453

I now have an algorithm, pretty similar to Ed Staub's above, with the same time estimates. It's a bit different from Ed's, since it is valid for all r

The counter-example by mhum to the greedy algorithm is neat. Take a look.

Upvotes: 0

mhum
mhum

Reputation: 2968

Here is an explicit counter-example (with N=4 and r=2) to the greedy algorithm proposed by the asker.

enter image description here

The maximum intersection between three of these rectangles is between the black, blue, and green rectangles. But, it's clear that the maximum intersection between any two of these three is smaller than intersection between the black and the red rectangles.

Upvotes: 1

Ed Staub
Ed Staub

Reputation: 15700

Postscript. This is pretty defective, but may spark some ideas. It's especially defective where there are outliers in a quadrant that are near the X and Y axes - they will tend to reinforce each other, as if they were both at 45 degrees, pushing the solution away from that quadrant in a way that may not make sense.

-

If r is a lot smaller than N, and N is fairly large, consider this:

Find the average center.
Sort the rectangles into 2 sequences by (X - center.x) + (Y - center.y) and (X - center.x) - (Y - center.y), where X and Y are the center of each rectangle.

For any solution, all of the reject rectangles will be members of up to 4 subsequences, each of which is a head or tail of each of the 2 sequences. Assuming N is a lot bigger than r, most the time will be in sorting the sequences - O(n log n).

To find the solution, first find the intersection given by removing the r rectangles at the head and tail of each sequence. Use this base intersection to eliminate consideration of the "core" set of rectangles that you know will be in the solution. This will reduce the intersection computations to just working with up to 4*r + 1 rectangles.

Each of the 4 sequence heads and tails should be associated with an array of r rectangles, each entry representing the intersection given by intersecting the "core" with the i innermost rectangles from the head or tail. This precomputation reduces the complexity of finding the solution from O(r^4) to O(r^3).

This is not perfect, but it should be close.

Defects with a small r will come from should-be-rejects that are at off angles, with alternatives that are slightly better but on one of the 2 axes. The maximum error is probably computable. If this is a concern, use a real area-of-non-intersection computation instead of the simple "X+Y" difference formula I used.

Upvotes: 1

FelixCQ
FelixCQ

Reputation: 2028

This is just a thought, but if N is very large, I would probably try a Monte-Carlo algorithm.

The idea would be to generate random points (say, uniformly in the convex hull of all rectangles), and score how each random point performs. If the random point is in N-r or more rectangles, then update the number of hits of each subset of N-r rectangles.

In the end, the N-r rectangle subset with the most random points in it is your answer.

This algorithm has many downsides, the most obvious one being that the result is random and thus not guaranteed to be correct. But as most Monte-Carlo algorithms it scales well, and you should be able to use it with higher dimensions as well.

Upvotes: -1

wizard
wizard

Reputation: 76

Since the intersection of axis-aligned rectangles is an axis-aligned rectangle, there are O(N4) possible intersections (O(N) lefts, O(N) rights, O(N) tops, O(N) bottoms). The obvious O(N5) algorithm is to try all of these, checking for each whether it's contained in at least N - r rectangles.

An improvement to O(N3) is to try all O(N2) intervals in the X dimension and run the 1D algorithm in the Y dimension on those rectangles that contain the given X-interval. (The rectangles need to be sorted only once.)

How large is N? I expect that fancy data structures might lead to an O(N2 log N) algorithm, but it wouldn't be worth your time if a cubic algorithm suffices.

Upvotes: 4

duedl0r
duedl0r

Reputation: 9424

I think I have a counter-example. Let's say you have r := N-2. I.e. you want to find two rectangles with maximum overlapping. Let's say you have to rectangles covering the same area (=maximum overlapping). Those two will be the optimal result in the end.

Now we need to construct some more rectangles, such that at least one of those two get removed in a reduction step.

Let's say we have three rectangles which overlap a lot..but they are not optimal. They have a very small overlapping area with the other two rectangles.

Now if you want to optimize the area for four rectangles, you will remove one of the two optimal rectangles, right? Or maybe you don't HAVE to, but you're not sure which decision is optimal.

So, I think your reduction algorithm is not quite correct. Atm I'm not sure if there is a good algorithm for this or in which complexity class this belongs to, though. If I have time I think about it :)

Upvotes: 2

Related Questions