tommyk
tommyk

Reputation: 3307

Data cache implications of using std::make_shared()

I read here that:

make_shared is (in practice) more efficient, because it allocates the reference control block together with the actual object in one single dynamic allocation. By contrast, the constructor for shared_ptr that takes a naked object pointer must allocate another dynamic variable for the reference count

Does it mean that vector of std::shared_ptr created using std::make_shared will be "cache-friendly" as the data (control block and real pointer's data) are in one chunk ?

My use case is a vector of 100 000 shared pointers where object pointed to is 14 bytes.

Upvotes: 2

Views: 1095

Answers (3)

David Schwartz
David Schwartz

Reputation: 182763

It is impossible to make a vector of shared pointers created with make_shared. Try it, you cannot do it. The best you can do is copy construct or copy assign the pointers in the vector from shared pointers made with make_shared. But then they will be somewhere else in memory.

However, the control blocks will still be near the object. When you call make_shared, you actually make three things: an object, a shared pointer control block to track the references to the object, and a shared pointer. The make_shared function causes the control block and the object itself to be allocated in a single contiguous memory block.

Whether that's cache friendly or not is an interesting question. Basically, it depends how you use the object.

If you frequently operate only on the shared pointers and not on the objects they point to (for example, duplicating the vector and thus incrementing the reference count on each shared pointer), then separate allocations will probably be more cache friendly, not the combined ones that make_share gives you.

If you frequently operate on the objects themselves every time you operate on the shared pointers, then make_shared should be more cache friendly under typical circumstances.

Upvotes: 1

std''OrgnlDave
std''OrgnlDave

Reputation: 3968

As an above poster mentioned, making an object with make_shared makes the "control block" adjacent to the object referred to.

In your case however I believe this to be a poor choice.

When you allocate memory, even in a big block, you have no guarantee to get contiguous "physical space" as opposed to sparse, fragmented page allocations. For this reason, iterating through your list would cause reads across large spans of memory just to get the control structures (which then point to the data).

"But my cache lines are 64 bytes long!" you say. If this is true, you might think, "this will mean that the object is loaded into cache along with the control structure," but that is not necessarily true. That depends on many things such as data alignment, cache line size, associativity of the cache, and the actual memory bandwidth you use.

The problem you run into is the fact that first the control structure needs to be fetched to figure out where the data is, when instead, that could be residing already in cache, so part of your data (the control structure) could at least be practically guaranteed to be in cache if you allocate them all together instead of with make_shared.

If you want to make your data cache-friendly, you want to make sure that all the references to it fit inside the highest-level cache possible. Continuing to use it will help to make sure it stays in cache. The cache algorithms are sophisticated enough to handle fetching your data unless your code is very branch-heavy. This is the other part of making your data "cache friendly:" use as few branches as possible when working on it.

Also, when working on it, try to break it up into pieces that fit in cache. Only operate on 32k of it at a time if possible - that is a conservative number on modern processors. If you know exactly which CPU you will be running your code on, you can optimize it less conservatively, if you need to.

EDIT: I forgot to mention a pertinent detail. The most frequent allocated page size is 4k. Caches are often "associative," especially in lower-end processors. 2-way associative means that each location in memory can only be mapped to every other cache entry; 4-way associative means it can be fit into any of 4 possible mappings, 8-way means any of 8 possible mappings etc. The higher the associativity the better for you. The fastest cache (L1) on a processor tends to be the least associative since it requires less control logic; having contiguous blocks of data to reference (such as contiguous control structures) is a good thing. Fully associative cache is desirable.

Upvotes: -1

Steve Jessop
Steve Jessop

Reputation: 279245

Maybe, but don't count on it.

For cache-friendliness, you want to use as little memory as possible, and you want operations that are close together in address to also be close together in time (that is, close enough that the second operation uses memory that is still in some level of cache from the effects of the first operation: the lower the level of cache the better).

If you use make_shared, then there might well be a slight saving in total memory use, which at least tends to be a win for the cache no matter what your memory usage pattern.

If you use make_shared, then the control block and the object referred to (referand) will be adjacent in memory.

If you don't use make_shared, and your objects are a different size from your control blocks, then with common memory allocators there's a reasonable chance that the objects will be clustered together in one place and the control blocks clustered together in a different place. If they are the same size (once rounded by the memory allocator in some implementation-specific way), then with common memory allocators there's a reasonable chance that they'll just alternate in memory for long runs unless shared_ptr does something to affect that.

Your memory access pattern will determine which of those layouts is better for the cache -- and of course the actual layout you get in the non-make_shared case might be something else again, depending on implementation details.

The fact that you have a vector is basically independent of all this, since the shared_ptr objects are separate from the control-blocks and the referands.

Upvotes: 1

Related Questions