Reputation: 51
There is a 2-d grid with arbitrary size. (e.g. 1000*1000)
Here's the pattern:
The given inputs are four ints: x y w h, which stands for the (x,y) coordinate for the lower left point of each rectangle and its width and height.
Rectangles that have sharing edges or overlapped are defined as connected.
A set of connected rectangles are defined as a cluster.
Numbers of rectangles on each unit cell are defined as its thickness.
So the problem asks to calculate: 1. The maximum thickness 2. The number of clusters 3. The maximum number of cluster elements for a single cluster 4. The maximum area for a single cluster
For example, in the picture above, the maximum thickness is 3 and there are 2 clusters. One cluster consists of 3 rectangles and its area is 44. The number of elements in the other cluster is 4 and the area is 30. Therefore the cluster with the maximum number of cluster elements is 4 and the maximum cluster area is 44.
My solution is quite naive. I used a 2-d int array to represent the grid. When reading input, I simply increment the number in the corresponding cells. After all input is read, I used depth-first search to decide the clusters.
The problem is my algorithm is very slow and soon gets unusable when given large input(e.g. when the grid size goes over 1000*1000)
So I'd like to know is there any efficient data structure and algorithm to use for solving this problem?
Upvotes: 5
Views: 4160
Reputation: 1627
My suggestion would be to combine the two other answers currently given: use some spatial data structure such as an AABB tree or quadtree to store all rectangles you have processed so far, so that you can quickly find connected rectangles; and then store the connected clusters in a union find or disjoint set data structure.
Once you've processed all rectangles, you can then answer questions 2 and 3 in linear time. Question 1 (max thickness) is a little trickier; if you're not careful, you may end up doing a quadratic number of overlap tests if all rectangles overlap. Similarly for answer 4.
So when you enter a rectangle that overlaps into the spatial data structure, the best solution might be to split it into disjoint parts of constant thickness. Then 1 and 4 are also easy to answer relatively quickly.
Upvotes: 1
Reputation: 3792
I think you just described spatial partitioning using a 2D AABB structure.
The input grids get thrown into the tree and oriented arbitrarily. The tree's purpose is to sort.
You may need to extend the capability of the tree to fit your needs (e.g. connections, tracking clusters, layers/thickness).
The spatial partitioning tree used to contain the 2D AABB can be widened, splayed, or leaf-cached to efficiently reduce access time.
2D physics engines in video games use this to effectively manage hundreds to thousands of arbitrarily shaped object collisions easily on arbitrarily large (roughly infinite) grid sizes. Their processing can be accelerated by technologies like OpenCL (for CPUs and GPUs) or other GPGPU technologies.
See this wikipedia article on AABBs.
See this wikipedia article on BSPs for the general idea of using a (binary) tree for spatial partitioning.
See this wikipedia article on quadtrees as a higher-order tree concept that might be easier to implement, might even be faster if everything is AABB, as it's 4x wider. You can also flatten even higher-order trees, though the benefits mostly seem to max at quadtrees for 2D space.
See this wikipedia article on collision detection as a way to determine connections and clusters.
See this wikipedia article on splay trees for an idea on the benefits splaying and widening trees.
Upvotes: 3
Reputation: 43517
After all input is read, I used depth-first search to decide the clusters.
You don't need to do this - you can decide on the clusters while reading the input, like you do for deciding the thickness.
For each cell, keep track of which cluster it belongs to. When iterating the cells of your grid corresponding to the read rectangle, keep track of the clusters they belong to. Merge all clusters you find during this traversal into a single cluster. You can do this efficiently in very close to constant time using the Disjoint-set data structure.
However, your entire problem might be better solved by a sweep line algorithm.
Upvotes: 1