Daggaz
Daggaz

Reputation: 9

Data Orientated Design; how do I optimize a data structure in c++ for performance?

I would like to have a class of varying number n of objects which are easily iterated over as a group, with each object member having a large list (20+) of individually modified variables influencing class methods. Before I started learning OOP, I would just make a 2D array and load the variable values into each row, corresponding to each object, and then append/delete rows as needed. Is this still a good solution? Is there a better solution?

Again, in this case I am more interested in pushing processor performance rather than preserving abstraction and modularity, etc. In this respect, I am very confused about the way the data container ultimately is read into the L1 cache, and how to ensure that I do not induce page inefficiency or cache-misses. If for example, I have a 128 kb cache, I assume the entire container should fit into this cache to be efficient, correct?

Upvotes: 0

Views: 1242

Answers (1)

Andreas Wenzel
Andreas Wenzel

Reputation: 25031

According to Agner Fog's optimization manual, the C++ Standard Template Library is rather inefficient, because it makes extensive use of dynamic memory allocation. However, a fixed size array that is made larger than necessary (e.g. because the needed size is not known at compile time) can also be bad for performance, because a larger size means that it won't fit into the cache as easily. In such situations, the STL's dynamic memory allocation could perform better.

Generally, it is best to store your data in contiguous memory. You can use a fixed size array or an std::vector for this. However, before using std::vector, you should call std::vector::reserve() for performance reasons, so that the memory does not have to be reallocated too often. If you reallocate too often, the heap could become fragmented, which is also bad for cache performance.

Ideally, the data that you are working on will fit entirely into the Level 1 data cache (which is about 32 KB on modern desktop processors). However, even if it doesn't fit, the Level 2 cache is much larger (about 512 KB) and the Level 3 Cache is several Megabytes. The higher-level caches are still significantly faster than reading from main memory.

It is best if your memory access patterns are predictable, so that the hardware prefetcher can do its work best. Sequential memory accesses are easiest for the hardware prefetcher to predict.

The CPU cache works best if you access the same data several times and if the data is small enough to be kept in the cache. However, even if the data is used only once, the CPU cache can still make the memory access faster, by making use of prefetching.

A cache miss will occur if

  1. the data is being accessed for the first time and the hardware prefetcher was not able to predict and prefetch the needed memory address in time, or
  2. the data is no longer cached, because the cache had to make room for other data, due to the data being too large to fit in the cache.

In addition to the hardware prefetcher attempting to predict needed memory addresses in advance (which is automatic), it is also possible for the programmer to explicity issue a software prefetch. However, from what I have read, it is hard to get significant performance gains from doing this, except under very special circumstances.

Upvotes: 0

Related Questions