Reputation: 646

Idea for a data structure to store 2D data?

I have a large 2D grid, x-by-y. The user of the application will add data about specific points on this grid. Unfortunately, the grid is far too big to be implemented as a large x-by-y array because the system on which this is running does not have enough memory.

What is a good way to implement this so that only the points that have data added to them are stored in memory?

My first idea was to create a BST of the data points. A hash function such as "(long)x<<32 + y" would be used to compare the nodes.

I then concluded that this could lose efficiency if not well balanced so I came up with the idea of having a BST of comparable BSTs of points. The outer BST would compare the inner BSTs based on their x values. The inner BSTs would compare the points by their y values (and they would all have the same x). So when the programmer wants to see if there is a point at (5,6), they would query the outer BST for 5. If an inner BST exists at that point then the programmer would query the inner BST for 6. The result would be returned.

Can you think of any better way of implementing this?

Edit: In regards to HashMaps: Most HashMaps require having an array for the lookup. One would say "data[hash(Point)] = Point();" to set a point and then find the Point by hashing it to find the index. The problem, however, is that the array would have to be the size of the range of the hash function. If this range is less than the total number of data points that are added then they would either have no room or have to be added to an overflow. Because I don't know the number of points that will be added, I would have to make an assumption that this number would be less than a certain amount and then set the array to that size. Again, this instantiates a very large array (although smaller than originally if the assumption is that there will be less data points than x*y). I would like the structure to scale linearly with the amount of data and not take up a large amount when empty.

It looks like what I want is a SparseArray, as some have mentioned. Are they implemented similarly to having a BST inside of a BST?

Edit2: Map<> is an interface. If I were to use a Map then it looks like TreeMap<> would be the best bet. So I would end up with TreeMap< TreeMap< Point> >, similar to the Map< Map< Point> > suggestions that people have made, which is basically a BST inside of a BST. Thanks for the info, though, because I didn't know that the TreeMap<> was basically the Java SDK of a BST.

Edit3: For those whom it may concern, the selected answer is the best method. Firstly, one must create a Point class that contains (x,y) and implements comparable. The Point could potentially be compared by something like (((long)x)<<32)+y). Then one would TreeMap each point to the data. Searching this is efficient because it is in a balanced tree so log(n) cost. The user can also query all of this data, or iterate through it, by using the TreeMap.entrySet() function, which returns a set of Points along with the data.

In conclusion, this allows for the space-efficient and search-efficient implementation of a sparse array, or in my case, a 2D array, that can also be iterated through efficiently.

Upvotes: 9

Answers (8)

AlexWien

Reputation: 28727

Either a Quadtree, a k-d-tree or an R-tree.

Store index to large point array into one of the spatial structures. Such spatial structures are advantageous if the data is not equally distributed, like geographic data that concentrates in cities, and have no point in the sea.

Think if you can forget the regular grid, and stay with the quad tree.
(Think, why do you need a regular grid? A regular grid is usually only a simplification)

Under no circumstances use Objects to store a Point. Such an Object needs 20 bytes only for the fact that it is an object! A bad idea for a huge data set.

An int x[], and int[] y, or an int[]xy array is ideal related to memory usage.

Consider reading

Hanan Samet's "Foundations of Multidimensional Data Structures"

(at least the Introduction).

Upvotes: 8

Daniel De León

Reputation: 13639

My suggestion to you is use Commons Math: The Apache Commons Mathematics Library. Because it will save your day, by leveraging the math force that your application require.

Upvotes: 0

jmruc

Reputation: 5836

You could use a Map<Pair, Whatever> to store your data (you have to write the Pair class). If you need to iterate the data in some specific order, make Pair Comparable, and use NavigableMap

Upvotes: 4

Guillaume

Reputation: 14656

You might want to look at FlexCompColMatrix, CompColMatrix and other sparse matrices implementations from the Matrix toolkit project.

The performance will really depends on the write/read ratio and on the density of the matrix, but if you're using a matrix package it will be easier to experiment by switching the implementation

Upvotes: 0

Vivin Paliath

Reputation: 95508

One approach could be Map<Integer, Map<Integer, Data>>. The key on the outer map is the row value, and the key in the inner map is the column value. The value associated with that inner map (of type Data in this case) corresponds to the data at (row, column). Of course, this won't help if you're looking at trying to do matrix operations or such. For that you'll need sparse matrices.

Another approach is to represent the row and column as a Coordinate class or a Point class. You will need to implement equals and hashCode (should be very trivial). Then, you can represent your data as Map<Point, Data> or Map<Coordinate, Data>.

Upvotes: 2

robjohncox

Reputation: 3665

I think you are on the right track to do this in a memory efficient way - it can be implemented fairly easily by using a map of maps, wrapped in a class to give a clean interface for lookups.

An alternative (and more memory efficient) approach would be to use a single map, where the key was a tuple (x,y). However, this would be less convenient if you need to make queries like 'give me all values where x == some value'.

Upvotes: 0

MaQy

Reputation: 476

Maybe I'm being too simplistic here, but I think you can just use a regular HashMap. It would contain custom Point objects as keys:

class Point {
    int x;
    int y;
}

Then you override the equals method (and thus the hashCode method) to be based on x and y. That way you only store points that have some data.

Upvotes: 0

Sam I am says Reinstate Monica

Reputation: 31184

You could have a list of lists of an object, and that object can encode it's horizontal and vertical position.

class MyClass
{
    int x;
    int y;
    ...
}

Upvotes: 1

Idea for a data structure to store 2D data?

Answers (8)

Related Questions