Reputation: 262

Vector re-declaration versus insertions in looping operations - C++

I have an option to either create and destroy a vector on every call to func() and push elements in each iteration, as shown in Example A OR fixed the initialization and only overwrite old values in each iteration, as shown in Example B.

Example A:

void func () 
{
    std::vector<double> my_vec(5, 0.0);
    for ( int i = 0; i < my_vec.size(); i++) {
        my_vec.push_back(i);
        // do something
    }
}

while (condition) {
    func();
}

Example B:

void func (std::vector<double>& my_vec) 
{
    for ( int i = 0; i < my_vec.size(); i++) {
        my_vec[i] = i;
        // do something
    }
}

while (condition) {
    std::vector<double> my_vec(5, 0.0);
    func(myVec);
}

Which of the two would be computationally inexpensive. The size of the array won't be more than 10.

Upvotes: 0

Answers (1)

JaMiT

Reputation: 17005

I still suspect that the question that was asked is not the question that was intended, but it occurred to me that the main point of my answer would likely not change. If the question gets updated, I can always edit this answer to match (or delete it, if it turns out to be inapplicable).

De-prioritize optimizations

There are various factors that should affect how you write your code. Among the desirable goals are space optimization, time optimization, data encapsulation, logic encapsulation, readability, robustness, and correct functionality. Ideally, all of these goals would be achievable in every piece of code, but that is not particularly realistic. Much more likely is a situation where one or more of these goals must be sacrificed in favor of the others. When that happens, optimizations should typically yield to everything else.

That is not to say that optimizations should be ignored. There are plenty of optimizations that rarely obstruct the higher-priority goals. These range from the small, such as passing by const reference instead of by value, to the large, such as choosing the logarithmic algorithm instead of the exponential one. However, the optimizations that do interfere with the other goals should be postponed until after your code is reasonably complete and functioning correctly. At that point, a profiler should be used to determine where the bottlenecks actually are. Those bottlenecks are the only places where other goals should yield to optimizations, and only if the profiler confirms that the optimizations achieved their goals.

For the question being asked, this means that the main concern should not be computational expense, but encapsulation. Why should the caller of func() need to allocate space for func() to work with? It should not, unless a profiler identified this as a performance bottleneck. And if a profiler did that, it would be much easier (and more reliable!) to ask the profiler if the change helps than to ask Stack Overflow.

Why?

I can think of two major reasons to de-prioritize optimizations. First, the "sniff test" is unreliable. While there might be a few people who can identify bottlenecks by looking at code, there are many, many more who merely think they can. Second, that's why we have optimizing compilers. It is not unheard of for someone to come up with this super-clever optimization trick only to discover that the compiler was already doing it. Keep your code clean and let the compiler handle the routine optimizations. Only step in when the task demonstrably exceeds the compiler's capabilities.

See also: premature-optimization

Choosing an optimization

OK, suppose the profiler did identify construction of this small, 10-element array as a bottleneck. The next step is to test an alternative, right? Almost. First you need an alternative, and I would consider a review of the theoretical benefits of various alternatives to be useful. Just keep in mind that this is theoretical and that the profiler gets the final say. So I'll go into the pros and cons of the alternatives from the question, as well as some other alternatives that might bear consideration. Let's start from the worst options, working our way to the better ones.

Example A

In Example A, a vector is created with 5 elements, then elements are pushed onto the vector until i meets or exceeds the vector's size. Seeing how i and the vector's size are both increased by one each iteration (and i starts smaller than the size), this loop will run until the vector grows large enough to crash the program. That means probably billions of iterations (despite the question's claim that the size will not exceed 10).

Easily the most computationally expensive option. Don't do this.

Example B

In example B, a vector is created for each iteration of the outer while loop, which is then accessed by reference from within func(). The performance cons here include passing a parameter to func() and having func() access the vector indirectly through a reference. There are no performance pros as this does everything the baseline (see below) would do, plus some extra steps.

Even though a compiler might be able to compensate for the cons, I see no reason to try this approach.

Baseline

The baseline I'm using is a fix to Example A's infinite loop. Specifically, replace "my_vec.push_back(i);" with Example B's "my_vec[i] = i;". This simple approach is along the lines of what I would expect for the initial assessment by the profiler. If you cannot beat simple, stick with it.

Example B*

The text of the question presents an inaccurate assessment of Example B. Interestingly, the assessment describes an approach that has the potential to improve on the baseline. To get code that matches the textual description, move Example B's "std::vector<double> my_vec(5, 0.0);" to the line immediately before the while statement. This has the effect of constructing the vector only once, rather than constructing it with each iteration.

The cons of this approach are the same as those of Example B as originally coded. However, we now pick up a gain in that the vector's constructor is called only once. If construction is more expensive than the indirection costs, the result should be a net improvement once the while loop iterates often enough. (Beware these conditions: that's a significant "if" and there is no a priori guess as to how many iterations is "enough".) It would be reasonable to try this and see what the profiler says.

Get some static

A variant on Example B* that helps preserve encapsulation is to use the baseline (the fixed Example A), but precede the declaration of the vector with the keyword static. This brings in the benefit of constructing the vector only once, but without the overhead associated with making the vector a parameter. In fact, the benefit could be greater than in Example B* since construction happens only once per program execution, rather than each time the while loop is started. The more times the while loop is started, the greater this benefit.

The main con here is that the vector will occupy memory throughout the program's execution. Unlike Example B*, it will not release its memory when the block containing the while loop ends. Using this approach in too many places would lead to memory bloat. So while it is reasonable to profile this approach, you might want to consider other options. (Of course if the profiler calls this out as the bottleneck, dwarfing all others, the cost is small enough to pay.)

Fix the size

My personal choice for what optimization to try here would be to start from the baseline and switch the vector to std::array<10,double>. My main motivation is that the needed size won't be more than 10. Also relevant is that the construction of a double is trivial. Construction of the array should be on par with declaring 10 variables of type double, which I would expect to be negligible. So no need for fancy optimization tricks. Just let the compiler do its thing.

The expected possible benefit of this approach is that a vector allocates space on the heap for its storage, which has an overhead cost. The local array would not have this cost. However, this is only a possible benefit. A vector implementation might already take advantage of this performance consideration for small vectors. (Maybe it does not use the heap until the capacity needs to exceed some magic number, perhaps more than 10.) I would refer you back to earlier when I mentioned "super-clever" and "compiler was already doing it".

I'd run this through the profiler. If there's no benefit, there is likely no benefit from the other approaches. Give them a try, sure, since they're easy enough, but it would probably be a better use of your time to look at other aspects to optimize.

Upvotes: 1