Reputation: 445
I have a set of objects(each object contains a rectangle and a value assigned to it) which is kept in a vector container. See picture below:
I need to create a matrix by drawing horizontal and vertical lines at each y/x lower left (LL) / upper right(UR) coordinate like below:
And I need to assign value = 0 to each new empty rectangle, and to other rectangles which are inside of initial rectangles, I need to assign their old values.
I've implemented this with some naive algorithm but it works too slow when I have huge number of rectangles. My algorithm basically does the following:
- Stores all rectangles in a map container. Each element of the map contains set of rectangles with the same LL Y coordinate and they are sorted by LL X coordinate, i.e. key is LL Y coordinate.
- Stores all X/Y coordinates in set containers.
- Iterates over Y/X coordinate containers, and for each new rectangle finds out if it exists in map or not, if exists-assigns existing value to it, otherwise-assigns 0 value. I.e, for each new rectangle it looks for its LL Y coordinate in map, if such Y exists, then searches through the corresponding value(set of rectangles), otherwise-it searches in a whole map.
Is there an effective algorithm to get needed results?
Upvotes: 3
Views: 753
Reputation: 23955
Assuming you know the top- and bottom-most y
and the left- and right-most x
, extend the four vectors belonging to each rectangle to the respective max and min x
and y
points. Keep a set of extended vertical vectors and a set of extended horizontal ones. Whenever an extended vector is added, it will necessarily intersect with each vector in the perpendicular list - the intersections are the cell coordinates of the matrix.
Once the list of cell coordinates is made, iterate over them and assign values appropriately, looking up if they are in or out of an original rectangle. I'm not too versed in data structures for rectangles, but it seems to me that two interval trees, one for horizontal, the other for vertical could find that answer in O(log n)
time per query, where n
is the number of intervals in the tree.
All together, this method seems to be O(n * log m)
time, where n
is the number of cell coordinates in the resultant matrix and m
is the number of original rectangles.
Upvotes: 1
Reputation: 51216
For n rectangles this can be solved easily in O(n^3) time (or just O(n^2) time if at most a bounded number of rectangles intersect) by looking at the problem a different way. This should be adequate for handling up to thousands of rectangles in a few seconds.
Also, unless some other constraints are added to the problem, the latter time bound is optimal: that is, there exist inputs consisting of n non-intersecting rectangles for which O(n^2) smaller grid rectangles will need to be output (which of course requires O(n^2) time). An example such input is n width-1 rectangles, all having equal bottommost y co-ord and having heights 1, 2, ..., n.
First of all, notice that there can be at most 2n vertical lines, and at most 2n horizontal lines, since each input rectangle introduces at most 2 of each kind (it may introduce less if one or both vertical lines are also the edge(s) for some already-considered rectangle, and likewise for horizontal lines). So there can be at most (2*n - 1)^2 = O(n^2) cells in the grid defined by these lines.
We can invent a co-ordinate system for grid cells in which each cell is identified by its lower-left corner, and the co-ordinates of an intersection of two grid lines is given simply by the number of horizontal grid lines below it and the number of vertical grid lines to its left (so that the bottommost, leftmost grid cell has co-ords (0, 0), the cell to its right has co-ords (1, 0), the cell two cells above that cell has co-ords (1, 2), etc.)
For each input rectangle having LL co-ords (x1, y1) and UR co-ords (x2, y2), we determine the horizontal and vertical intervals that it occupies within the new grid co-ordinate system, and then simply iterate through every cell (i, j) belonging to this rectangular region (i.e., every grid cell (i, j) such that toGridX(x1) <= i < toGridX(x2) and toGridY(y1) <= j < toGridY(y2)) with a nested for
loop, recording in a hashtable that the ID (colour?) for the cell at (i, j) should be the colour of the current input rectangle. Input rectangles should be processed in decreasing z-order (implicitly at least there seems to be such an order, from your example) so that for any cell covered by more than one input rectangle, the hashtable will wind up recording whatever the "nearest" rectangle's colour is. Finally, iterate through the hash table, converting each grid co-ord pair (i, j) back to the LL and UR co-ords of the input-space rectangle that corresponds to this grid cell, and output this rectangle with the ID given by the value for this hash key.
In order to accomplish the above, we need two things: a way to map input-space co-ordinates to grid co-ordinates (to determine the horizontal and vertical grid intervals for a given input rectangle), and a way to map grid co-ordinates back to input-space co-ordinates (to generate the output rectangles in the final step). Both operations are easy to do via that old workhorse, sorting.
Given any corner (x, y) of some input rectangle, the grid x co-ordinate corresponding to x, toGridX(x), is simply the rank position of x within the sorted list of all distinct x positions of vertical edges that are present among the input rectangles. Similarly, toGridY(y) is just the rank position of y within the sorted list of all distinct y positions of horizontal edges that are present among the input rectangles. In the other direction, for any grid co-ordinate (i, j), the corresponding input-space x co-ordinate, fromGridX(i), is simply the i-th smallest x co-ord (ignoring duplicates) of any vertical edge among the input rectangles, and similarly for fromGridY(j). These can all be computed as follows (all array indices start at 0, and I show only how to do it for x co-ords; y co-ords are similar):
By this time, for any i, VERT[i] is an array that contains (in its second and subsequent positions) the IDs of every input rectangle that uses, as either its left or right edge, the ith-leftmost distinct vertical line used by any input rectangle -- or in other words, the rank-i vertical line. We now "invert" this:
As previously established, there are at most O(n^2) grid cells. Each of the n input rectangles can occupy at most all of these cells, each of which is visited once per input rectangle, for a time bound of O(n^3). Note that this is an extremely pessimistic time bound, and for example if none (or none but a bounded number) of your rectangles overlap, then it drops to O(n^2) since no grid cell will ever be visited more than once.
Upvotes: 1
Reputation: 11968
I suspect the lookups and iterations are not fast enough. Things like 'otherwise it searches the whole map' point out that you do very heavy computations.
What I think you need is to use a 2d datastructure. A k-d tree or a BSP would work but the easiest to understand and implement would be a quad tree.
In a quad tree each node represents a rectangle in your space. Each node can be split into 4 children by selecting the mid point along the 2 dimensions and having the children represent the 4 resulting rectangles. Each node also holds the value that you want to assign to the area and an extra flag if the value is uniform.
To mark a rectangle with some value, you start from the root and recursively:
The main advantage of this approach is that you get to mark large areas of your map quickly. You can also prove that marking a area is O(logN) where N is the size of your map (with a larger constant than the usual tree).
You can find a more detailed explanation and some helpful images on wikipedia.
Upvotes: 1