Reputation: 3307
I read here that:
make_shared is (in practice) more efficient, because it allocates the reference control block together with the actual object in one single dynamic allocation. By contrast, the constructor for shared_ptr that takes a naked object pointer must allocate another dynamic variable for the reference count
Does it mean that vector of std::shared_ptr created using std::make_shared will be "cache-friendly" as the data (control block and real pointer's data) are in one chunk ?
My use case is a vector of 100 000 shared pointers where object pointed to is 14 bytes.
Upvotes: 2
Views: 1095
Reputation: 182763
It is impossible to make a vector of shared pointers created with make_shared
. Try it, you cannot do it. The best you can do is copy construct or copy assign the pointers in the vector from shared pointers made with make_shared
. But then they will be somewhere else in memory.
However, the control blocks will still be near the object. When you call make_shared
, you actually make three things: an object, a shared pointer control block to track the references to the object, and a shared pointer. The make_shared
function causes the control block and the object itself to be allocated in a single contiguous memory block.
Whether that's cache friendly or not is an interesting question. Basically, it depends how you use the object.
If you frequently operate only on the shared pointers and not on the objects they point to (for example, duplicating the vector and thus incrementing the reference count on each shared pointer), then separate allocations will probably be more cache friendly, not the combined ones that make_share
gives you.
If you frequently operate on the objects themselves every time you operate on the shared pointers, then make_shared
should be more cache friendly under typical circumstances.
Upvotes: 1
Reputation: 3968
As an above poster mentioned, making an object with make_shared makes the "control block" adjacent to the object referred to.
In your case however I believe this to be a poor choice.
When you allocate memory, even in a big block, you have no guarantee to get contiguous "physical space" as opposed to sparse, fragmented page allocations. For this reason, iterating through your list would cause reads across large spans of memory just to get the control structures (which then point to the data).
"But my cache lines are 64 bytes long!" you say. If this is true, you might think, "this will mean that the object is loaded into cache along with the control structure," but that is not necessarily true. That depends on many things such as data alignment, cache line size, associativity of the cache, and the actual memory bandwidth you use.
The problem you run into is the fact that first the control structure needs to be fetched to figure out where the data is, when instead, that could be residing already in cache, so part of your data (the control structure) could at least be practically guaranteed to be in cache if you allocate them all together instead of with make_shared.
If you want to make your data cache-friendly, you want to make sure that all the references to it fit inside the highest-level cache possible. Continuing to use it will help to make sure it stays in cache. The cache algorithms are sophisticated enough to handle fetching your data unless your code is very branch-heavy. This is the other part of making your data "cache friendly:" use as few branches as possible when working on it.
Also, when working on it, try to break it up into pieces that fit in cache. Only operate on 32k of it at a time if possible - that is a conservative number on modern processors. If you know exactly which CPU you will be running your code on, you can optimize it less conservatively, if you need to.
EDIT: I forgot to mention a pertinent detail. The most frequent allocated page size is 4k. Caches are often "associative," especially in lower-end processors. 2-way associative means that each location in memory can only be mapped to every other cache entry; 4-way associative means it can be fit into any of 4 possible mappings, 8-way means any of 8 possible mappings etc. The higher the associativity the better for you. The fastest cache (L1) on a processor tends to be the least associative since it requires less control logic; having contiguous blocks of data to reference (such as contiguous control structures) is a good thing. Fully associative cache is desirable.
Upvotes: -1
Reputation: 279245
Maybe, but don't count on it.
For cache-friendliness, you want to use as little memory as possible, and you want operations that are close together in address to also be close together in time (that is, close enough that the second operation uses memory that is still in some level of cache from the effects of the first operation: the lower the level of cache the better).
If you use make_shared
, then there might well be a slight saving in total memory use, which at least tends to be a win for the cache no matter what your memory usage pattern.
If you use make_shared
, then the control block and the object referred to (referand) will be adjacent in memory.
If you don't use make_shared
, and your objects are a different size from your control blocks, then with common memory allocators there's a reasonable chance that the objects will be clustered together in one place and the control blocks clustered together in a different place. If they are the same size (once rounded by the memory allocator in some implementation-specific way), then with common memory allocators there's a reasonable chance that they'll just alternate in memory for long runs unless shared_ptr
does something to affect that.
Your memory access pattern will determine which of those layouts is better for the cache -- and of course the actual layout you get in the non-make_shared
case might be something else again, depending on implementation details.
The fact that you have a vector
is basically independent of all this, since the shared_ptr
objects are separate from the control-blocks and the referands.
Upvotes: 1