Performance impact when resizing vector within capacity

Question

I have the following synthesized example of my code:

#include 
#include 
#include 

#define CAPACITY 10000

int main() {
    std::vector> a;
    std::vector> b;

    a.resize(CAPACITY, std::vector {0, 0})
    b.resize(CAPACITY, std::array {0, 0})

    for (;;) {
        size_t new_rand_size = (std::rand() % CAPACITY);

        a.resize(new_rand_size);
        b.resize(new_rand_size);

        for (size_t i = 0; i < new_rand_size; ++i) {
            a[i][0] = std::rand();
            a[i][1] = std::rand();
            b[i][0] = std::rand();
            b[i][1] = std::rand();
        }

        process(a); // respectively process(b)
    }
}

so obviously, the array version is better, because it requires less allocation, as the array is fixed in size and continuous in memory (correct?). It just gets reinitialized when up-resizing again within capacity.

Since I'm going to overwrite anyway, I was wondering if there's a way to skip initialization (e.g. by overwriting the allocator or similar) to optimize the code even further.

JaMiT · Accepted Answer

so obviously,

The word "obviously" is typically used to mean "I really, really want the following to be true, so I'm going to skip the part where I determine if it is true." ;) (Admittedly, you did better than most since you did bring up some reasons for your conclusion.)

the array version is better, because it requires less allocation, as the array is fixed in size and continuous in memory (correct?).

The truth of this depends on the implementation, but the there is some validity here. I would go with a less micro-managementy approach and say that the array version is preferable because the final size is fixed. Using a tool designed for your specialized situation (fixed size array) tends to incur less overhead than using a tool for a more general situation. Not always less, though.

Another factor to consider is the cost of default-initializing the elements. When a std::array is constructed, all of its elements are constructed as well. With a std::vector, you can defer constructing elements until you have the parameters for construction. For objects that are expensive to default-construct, you might be able to measure a performance gain using a vector instead of an array. (If you cannot measure a difference, don't worry about it.)

When you do a comparison, make sure the vector is given a fair chance by using it well. Since the size is known in advance, reserve the required space right away. Also, use emplace_back to avoid a needless copy.

Final note: "contiguous" is a bit more accurate/descriptive than "continuous".

It just gets reinitialized when up-resizing again within capacity.

This is a factor that affects both approaches. In fact, this causes your code to exhibit undefined behavior. For example, let's suppose that your first iteration resizes the outer vector to 1, while the second resizes it to 5. Compare what your code does to the following:

std::vector> a;
a.resize(CAPACITY, std::vector {0, 0});
a.resize(1);
a.resize(5);
std::cout << "Size " << a[1].size() <<".
";

The output indicates that the size is zero at this point, yet your code would assign a value to a[1][0]. If you want each element of a to default to a vector of 2 elements, you need to specify that default each time you resize a, not just initially.

Since I'm going to overwrite anyway, I was wondering if there's a way to skip initialization (e.g. by overwriting the allocator or similar) to optimize the code even further.

Yes, you can skip the initialization. In fact, it is advisable to do so. Use the tool designed for the task at hand. Your initialization serves to increase the capacity of your vectors. So use the method whose sole purpose is to increase the capacity of a vector: vector::reserve.

Another option – depending on the exact situation — might be to not resize at all. Start with an array of arrays, and track the last usable element in the outer array. This is sort of a step backwards in that you now have a separate variable for tracking the size, but if your real code has enough iterations, the savings from not calling destructors when the size decreases might make this approach worth it. (For cleaner code, write a class that wraps the array of arrays and that tracks the usable size.)

Performance impact when resizing vector within capacity

Answers (2)

Related Questions