Reputation: 47
I'm just starting to write a new project which should read in X and Y coordinates (which can go from 0 to 4,000,000,000) and store those points.
Because I don't know how many points I'll have to store, I have to allocate the memory needed while the program is running.
Then I'd like to find neighboring points (horizontal&vertical) to create "islands" and store all connected points in a group.
Now I'm not quite sure which datatype or how to store the coordinates at all. My very first idea was to have like a 2d array[x][y] which stores the state of that exact point. So if I'm on position array[5][5] i could easily request the state of the next point by adding x+1 (array[6][5]). Problem with that is I would have to initialize an array which also holds points, that are not occupied at all and I think array[4,000,000,000][4,000,000,000] wouldn't work anywas.
So what would be the best storage so I'm able to read a points state which allows me to find the neighbouring points?
thanks in advance, D
Edit: Also each island can have gaps
Upvotes: 2
Views: 114
Reputation: 44329
Selecting a data structure requires a lot of information about what the program is to do and how it'll be used. You have given us some information but more information will be needed to find the best data structure.
With the information given my first idea is a linked list of linked lists. The "outer" linked list could represent rows that are present (sorted) and the "inner" linked lists would represent the columns that are present (sorted) for that row.
Something like:
struct column
{
uint32_t cidx;
struct column* next;
}
struct row
{
uint32_t ridx;
struct column* column;
struct row* next;
}
The benefit of linked lists is that it's easy to insert elements in sorted order. Having the points in sorted order will help you look for "islands". For instance if you are looking at row 18, column 1000 and want to check if row 19, column 1000 exists you first check if the next
row-element is for row 19 and - if it is - you go through the corresponding column linked list to see if 1000 is present.
Upvotes: 1
Reputation: 144951
Using a 2D array indexed by the coordinates is impractical because of the range of these coordinates: 0..4000000000
. That is 16 billion billion cells, not only does it exceed current conceivable RAM space, but scanning such a vast almost empty area is going to be very inefficient too.
You should look into Quadtrees or similar data structures fit for storing geographical data.
Upvotes: 1