Helps a vector for cache locality? (C++)

Question

Last week I have read about great concepts as cache locality and pipelining in a cpu. Although these concepts are easy to understand I have two questions. Suppose one can choose between a vector of objects or a vector of pointers to objects (as in this question).

Then an argument for using pointers is that shufling larger objects may be expensive. However, I'm not able to find when I should call an object large. Is an object of several bytes already large?

An argument against the pointers is the loss of cache locality. Will it help if one uses two vectors where the first one contains the objects and will not be reordered and the second one contains pointers to these objects? Say that we have a vector of 200 objects and create a vector with pointers to these objects and then randomly shuffle the last vector. Is the cache locality then lost if we loop over the vector with pointers?

This last scenario happens a lot in my programs where I have City objects and then have around 200 vectors of pointers to these Cities. To avoid having 200 instances of each City I use a vector of pointers instead of a vector of Cities.

Mats Petersson · Accepted Answer

There is no simple answer to this question. You need to understand how your system interacts with regards to memory, what operations you do on the container, and which of those operations are "important". But by understanding the concepts and what affects what, you can get a better understanding of how things work. So here's some "discussion" on the subject.

"Cache locality" is largely about "keeping things in the cache". In other words, if you look at A, then B, and A is located close to B, they are probably getting loaded into the cache together.

If objects are large enough that they fill one or more cache-lines (modern CPU's have cache-lines of 64-128 bytes, mobile ones are sometimes smaller), the "next object in line" will not be in the cache anyways [1], so the cache-locality of the "next element in the vector" is less important. The smaller the object is, the more effect of this you get - assuming you are accessing objects in the order they are stored. If you pick a random number, then other factors start to become important [2], and the cache locality is much less important.

On the other other hand, as objects get larger, moving them within the vector (including growing, removing, inserting, as well as "random shuffle") will be more time consuming, as copying more data gets more extensive.

Of course, one further step is always needed to read from a pointer vs. reading an element directly in a vector, since the pointer itself needs to be "read" before we can get to the actual data in the pointee object. Again, this becomes more important when random-accessing things.

I always start with "whatever is simplest" (which depends on the overall construct of the code, e.g. sometimes it's easier to create a vector of pointers because you have to dynamically create the objects in the first place). Most of the code in a system is not performance critical anyway, so why worry about it's performance - just get it working and leave it be if it doesn't turn up in your performance measurements.

Of course, also, if you are doing a lot of movement of objects in a container, maybe vector isn't the best container. That's why there are multiple container variants - vector, list, map, tree, deque - as they have different characteristics with regards to their access and insert/remove as well as characteristics for linearly walking the data.

Oh, and in your example, you talk of 200 city objects - well, they are probably going to all fit in the cache of any modern CPU anyways. So stick them in a vector. Unless a city contains a list of every individual living in the city... But that probably should be a vector (or other container object) in itself.

As an experiment, make a program that does the same operations on a std::vector and std::vector [such as filling with random numbers, then sorting the elements], then make an object that is large [stick some array of integers in there, or some such], with one integer so that you can do the very same operations on that. Vary the size of the object stored, and see how it behaves. On YOUR system, where is the benefit of having pointers, over having plain objects. Of course, also vary the number of elements, to see what effect that has.

[1] Well, modern processors use cache-prefetching, which MAY load "next data" into the cache speculatively, but we certainly can't rely on this.

[2] An extreme case of this is a telephone exchange with a large number of subscribers (millions). When placing a call, the caller and callee are looked up in a table. But the chance of either caller or callee being in the cache is nearly zero, because (assuming we're dealing with a large city, say London) the number of calls placed and received every second is quite large. So caches become useless, and it gets worse, because the processor also caches the page-table entries, and they are also, most likely, out of date. For these sort of applications, the CPU designers have "huge pages", which means that the memory is split into 1GB pages instead of the usual 4K or 2MB pages that have been around for a while. This reduces the amount of memory reading needed before "we get to the right place". Of course, the same applies to various other "large database, unpredictable pattern" - airlines, facebook, stackoverflow all have these sort of problems.

Helps a vector for cache locality? (C++)

Answers (1)

Related Questions