mgfernan
mgfernan

Reputation: 815

Suggestions on a data structure

I've been struggling with finding a suitable data structure that meets these requirements:

  1. The elements of this data structure are organized in blocks. The order of these blocks are somewhat irrelevant, but the elements within the block maintain a certain order.
  2. The number of insertions are not frequent.
  3. Searches are by far more frequent.
  4. Retrieving the index of the element within the data structure is critical.
  5. Once an element of the data structure has been searched for, the next or previous (neightbour) element in the data structure should be easily accessible.

With this in mind, I have these considerations:

Developing a hash function that matches the order I need can be tricky because the input keys can be partially random.

A possible solution that I was considering (in C) was to keep an array of pointers to the elements that maintain the insertion order (or the order I need) and then a second array of pointers that sort the elements using a hash function. While the first array can be used to quickly access the elements and finding the neighbours, the second array can be used to quickly search the elements. But somehow I have the impression of overly complicating things and I do not want to reinvent the wheel.

What do you think? Any suggestion would be more than appreciated.

Many thanks

Upvotes: 4

Views: 207

Answers (3)

Lundin
Lundin

Reputation: 213408

Possibly a "chained hash table", where each index in the hash table is a double-linked list. In your example, I suppose each "block" would be represented by such a double-linked list.

This gives fast search for the block, but relatively slow search of the individual element inside the block. The number of items inside the block matters. However, you will get the next/previous item instantly, and traversing the list from there will also be fast. Linked lists can also be implemented as arrays, which is more friendly to data cache memory than heap allocation of individual nodes.

Alternatively, you could perhaps use a similar hash table, but use a binary search tree for each index. You will get fast searching and it will scale well with lots of data. It will be ever so slightly slower at retriveing the next/previous item, as you have to check if left/right leaves exist, otherwise check parent node.

Upvotes: 0

dbush
dbush

Reputation: 223699

An array would probably be the best data structure in this case.

Inserting into array involves searching for the correct slot for the new element, then shifting everything larger one element to the right using memmove. This can be costly if insertions are frequent, but if they are not frequent as you say then it shouldn't be a problem. You then have O(1) retrieval by index and O(log n) searches.

Maintaining an array of pointers to the actual data is a good move since that means you're only copying pointers instead of full data structures when inserting new elements.

So you have an array containing the data which is only appended to, and array of pointers which is resorted (i.e. find the right spot and shift) with each insert.

Upvotes: 2

kfx
kfx

Reputation: 8537

What about a search tree? It's a good fit for all of your requirements except 4.

To deal with this requirement, you can maintain an additional counter for each node. This counter would record the number of nodes in the subtree below the node.

Adding the counter would allow to find the index of the target node while doing the search operation (see here for an example how). It would make the insertion operation more complex, as after inserting a node you would also need to update the counters in all ancestor trees, but since you say you're not going to have many insertions, it should not be a problem.

Upvotes: 0

Related Questions