Max
Max

Reputation: 20004

Array-like datastructure performance

I'm trying to optimize a data structure which is heavily used. Right now this data structure is implemented as a dynamic array of composite nodes. A typical usage is sequential accesss from the first element to the last element with reading and modification of constituent values (integers and doubles). Typical array size is from 10 to 200 elements. Another type of operation is insertions/removals of elements of this array. After some profiling I found out that insertions are very expensive in terms of overall algorithm performance, so I decided o change this array to one of the two datastructures:

  1. Array of pointers to elements
  2. Array of indices to another array containing actual elements

This way I will only do insertions and removals in the index/pointer array which is much cheaper.

The second datastructure is much more complicated than the first one, it will require additional operations to avoid reallocations in element array, but it will keep all the elements in the same memory region which I think will be better for processor cache. My question is which way is better? And which is the best way to implement 2 variant to keep all the array elements in the same memory region?

Upvotes: 0

Views: 342

Answers (3)

templatetypedef
templatetypedef

Reputation: 372714

There is a very elegant data structure called a tiered vector which supports O(1) worst-case lookup, O(1) amortized append, and O(sqrt n) amortized insertion at any point in the structure. Moreover, it's very memory-efficient, wasting only O(sqrt n) overhead at any point (compared to the O(n) overhead from a dynamic array). I don't know how useful this might be in your particular instance, but a description can be found here:

http://www.cs.brown.edu/cgc/jdsl/papers/tiered-vector.pdf

Upvotes: 1

Chris Hopman
Chris Hopman

Reputation: 2102

How the insertion/deletions are done is sort of unclear, but by your current description you should be using a simple linked list.

Specifically, you say the use is iterating through the list, inserting and deleting. If the insertions are done at the beginning, end, or at the iterator then a linked list can do them all in constant time. Your requirements do not seem to include random access or an ordering.

If you are really worried about processor cache (which, at this point, you probably shouldn't be) you can allocate the list nodes from an array-based pool.

Upvotes: 0

David Weiser
David Weiser

Reputation: 5195

Have you thought about using a skip list, specifically an "indexable skip list"? An indexable skip list gives you O(logn) insertion, removal and lookup operations. Furthermore, it is similar to the current array-based implementation so it won't fundamentally change how the data structure is thought of.

Upvotes: 0

Related Questions