Reputation: 1609
I am wondering how a std::vector of pointers to objects affects the performance of a program, vs. using a std::vector directly containing the objects. Specifically I am referring to the speed of the program.
I was taught to use the std::vector over other STL such as the std::list for it's speed, due to all of it's data being stored contiguously in memory, rather than being fragmented. This meant that iterating over the elements was fast, however my thinking is that if my vector contains pointers to the objects, then the objects can still be stored anywhere in memory and only the pointers are stored contiguously. I am wondering how this would affect the performance of a program when it comes to iterating over the vector and accessing the objects.
My current project design uses a vector of pointers so that I can take advantage of virtual functions however i'm unsure whether this is worth the speed hit i may encounter when my vector becomes very large. Thanks for your help!
Upvotes: 1
Views: 1606
Reputation: 880
If you need the polymorphism, as people have said, you should store the pointers to the base. If, later, you decide this code is hot and needs to optimise it's cpu cache usage you can do that say by making the objects fit cleanly in cache lanes and/or with a custom allocator to ensure code locality of the dereferenced data.
Slicing is when you store objects by value Base and copy construct or allocate into them a Derived, Derived will be Sliced, the copy constructor or allocator only takes a Base and will ignore any data in Derived, there isn't enough space allocated in Base to take the full size of Derived. i.e. if Base is 8 bytes and Derived is 16, there's only enough room for Base's 8 bytes in the destination value even if you provided a copy constructor/allocator that explicitly took a Dervived.
I should say it's really not worth thinking about data cache coherence if you're using virtualisation heavily that wont be elided by the optimiser. An instruction cache miss is far more devestating than a data cache miss and virtualisation can cause instruction cache misses because it has to look up the vtable pointer before loading the function into the instruction cache and so can't preemptively load them.
CPU's tend to like to preload as much data as they can into caches, if you load an address an entire cache lane (~64 bytes) will be loaded into a cache lane and often it will also load the cache lane before and after which is why people are so keen on data locality.
So in your vector of pointers scenario when you load the first pointer you'll get lots of pointers in the cache at once loading through the pointer will trigger a cache miss and load up the data around that object, if your actual particles are 16 bytes and local to each other, you wont lose much beyond this. if they're all over the heap and massive you will get very cache churny on each iteration and relatively ok while working on a particle.
Traditionally, particle systems tend to be very hot and like to pack data tightly, it's common to see 16 byte Plain Old Data particle systems which you iterate linearly over with very predicatble branching. meaning you can generally rely on 4 particles per cache lane and have the prefetcher stay well ahead of your code.
I also should say that cpu caches are cpu dependant and i'm focusing on intel x86. Arm for example tends to be quite a bit behind intel & the pipeline is less complex, the prefetcher less capable and so cache misses can be less devastating.
Upvotes: 3