Jacques
Jacques

Reputation: 319

Heap allocation vs std::vector

I need to create a multi-tree in a critical part of my code. Standard way is to use something like:

struct node{ // or class
   ... data ...
  std::unordered_set<node*> children
};

node* root;

One important thing is that, once a node is created, on won't be deleted until the whole tree will be deleted.

The obvious way is to use the new and delete statements to manage these datas on the heap. Done.

I am wondering if it is worth exploring performances of using a std::vector instead of new/delete directly:

struct node{ // or class
   ... data ...
  std::unordered_set<uint_32> children
};

std::vector<node> tree;
uint_32 root; // probably always 0, fist element of vector

Deleting the tree would be simply done with tree.clear();

Is there any chance that it would be faster for big (or huge) trees?

Upvotes: 2

Views: 575

Answers (2)

darune
darune

Reputation: 10962

I'm guessing you have a tree in the size of some 10 million+ elements (or at least in the ballpark of 100k+ elements) to be concerned with this.

It then all depends on how your data and tree is used. Spatiality of data is usually of concern but its not obvious if that is your case too. If the tree is traversed a lot after being built (somewhat opposed to being modified a lot) and only being modified a little, then it makes sense to try to organize data such that it is iterated contiguous (as a std::vector usually is). In this case, you do not want to have pointers like that or rather you do not want to use single element new/delete for sure as that is almost guarenteed to bug you down (as you cannot count on the elements being placed contiguous in memory) - but it all depends on your use case(s).

A simple method and radical different approach would be:

std::vector< std::pair<PathInTree, Node> > tree;

Then make it sorted such that the Nodes appear in the order that you are iterating the tree. This is a somewhat extreme solution and has drawbacks including that inserting can now become expensive if theres a lot of random insertion going on. Also I left out how to implement Path in the tree, but it should usually be some kind of sequence (like in a filesystem structure).

Upvotes: 2

Ext3h
Ext3h

Reputation: 6298

Considering that each instance of std::unordered_set<uint_32> is still going to result in at least one more allocation on the heap (internally), the actually achievable savings have a hard bound. You are going to save less than 50% of the necessary heap allocations.

One important thing is that, once a node is created, it won't be deleted until the whole tree will be deleted.

Re-allocations are still happening in an std::vector as it grows. In contrast to immovable allocations on the heap, this additionally involves moving already allocated nodes. Depending on what other attributes your node elements have, that may be even costlier. You will also need to make sure that your node class is compatible with move semantics.

With std::unordered_set also allocating on the heap, you won't be able to avoid the resulting memory fragmentation either.

All in all, unless you expect to be limited by the performance of heap allocations, or you have a use case for order-independent tree traversal, compacting node instances is likely not going to be worth the effort.


A potentially reasonable optimization would be to swap std::unordered_set<node*> for std::unordered_set<node>. Unless you need that indirection in that place (external pointers?), that saves you just as much as your own attempt, without sacrificing debug capabilities.

Implementing the necessary hash function and equality operator for node is usually easy.


If performance is a real concern, and you also have a (reasonable) upper bound on the number of children per node, you may consider replacing std::unordered_set<uint_32> children by std::array<uint_32, n> children instead with use of std::find() for detecting collisions.

The reason for that is quite simple, ensuring that node becomes TriviallyCopyable which reduces re-allocations to a plain memcpy internally. For up to 16 children or so, this is likely going to be neutral in terms of memory consumption, but still faster.

Upvotes: 1

Related Questions