Reputation: 201
I am trying to optimize a C++ routine. The main bottleneck in this routine is the push_back() of a vector of objects. I tried using a deque instead and even tried a list. But strangely (and contrary to theory) deque and list implementations run much slower than the vector counterpart.
In fact even clear() runs much slower for the deque and list implementations than the vector counterpart. In this case too, Vector implementation seems to be the fastest while list implementation is the slowest.
Any pointers?
Note: vector reserve() could have sped the implementation but cannot be done as it is unknown in size.
Thanks.
Upvotes: 12
Views: 8248
Reputation: 24902
vector being faster to build or clear than deque or list is to be expected; it's a simpler data structure.
With regard to vector::push_back
, it has to do two things:
You can generally speed things up by eliminating step 1 by simply resizing the vector and using operator[]
to set items.
UPDATE: Original poster asked for an example. The code below times 128 mega insertions, and outputs
push_back : 2.04s
reserve & push_back : 1.73s
resize & place : 0.48s
when compiled and run with g++ -O3 on Debian/Lenny on an old P4 machine.
#include <iostream>
#include <time.h>
#include <vector>
int main(int,char**)
{
const size_t n=(128<<20);
const clock_t t0=clock();
{
std::vector<unsigned char> a;
for (size_t i=0;i<n;i++) a.push_back(i);
}
const clock_t t1=clock();
{
std::vector<unsigned char> a;
a.reserve(n);
for (size_t i=0;i<n;i++) a.push_back(i);
}
const clock_t t2=clock();
{
std::vector<unsigned char> a;
a.resize(n);
for (size_t i=0;i<n;i++) a[i]=i;
}
const clock_t t3=clock();
std::cout << "push_back : " << (t1-t0)/static_cast<float>(CLOCKS_PER_SEC) << "s" << std::endl;
std::cout << "reserve & push_back : " << (t2-t1)/static_cast<float>(CLOCKS_PER_SEC) << "s" << std::endl;
std::cout << "resize & place : " << (t3-t2)/static_cast<float>(CLOCKS_PER_SEC) << "s" << std::endl;
return 0;
}
Upvotes: 7
Reputation: 41509
You have to choose your container according to what you're going to do with it.
Relevant actions are: extending (with push
), insertion (may not be needed at all), extraction, deletion.
At cplusplus.com, there is a very nice overview of the operations per container type.
If the operation is push
-bound, it makes sense that the vector beats all others. The good thing about deque is that it allocates fixed chunks, so will make more efficient use of fragmented memory.
Upvotes: 0
Reputation: 1946
"push_back()" can be slow if the copy of an object is slow. If the default constructor is fast and you have a way tu use swap to avoid the copy, you could have a much faster program.
void test_vector1()
{
vector<vector<int> > vvi;
for(size_t i=0; i<100; i++)
{
vector<int> vi(100000, 5);
vvi.push_back(vi); // copy of a large object
}
}
void test_vector2()
{
vector<int> vi0;
vector<vector<int> > vvi;
for(size_t i=0; i<100; i++)
{
vector<int> vi(100000, 5);
vvi.push_back(vi0); // copy of a small object
vvi.back().swap(vi); // swap is fast
}
}
Results :
VS2005-debug
* test_vector1 -> 297
* test_vector2 -> 172
VS2005-release
* test_vector1 -> 203
* test_vector2 -> 94
gcc
* test_vector1 -> 343
* test_vector2 -> 188
gcc -O2
* test_vector1 -> 250
* test_vector2 -> 156
Upvotes: 2
Reputation: 6416
If you want vector to be fast, you must reserve() enough space. It makes a huge difference, because each grow is terrible expensive. If you dont know, make a good guess.
Upvotes: 1
Reputation: 58786
If you don't know how many object you'll be adding it's very difficult to come up with an optimal solution. All you can do is try to minimize the cost that you know is happening - which in this case is that your vector is being constantly resized.
You could do this in two ways;
1) Split your operation into building and finalizing. This is where you build the list into a vector that is guaranteed to be big enough and when done copy it to another vector.
E.g.
std::vector<Foo> hugeVec;
hugeVec.reserve(1000); // enough for 1000 foo's
// add stuff
std::vector<Foo> finalVec;
finalVec = hugeVec;
2) Alternatively, when your vector is full call reserve with enough for another set of objects;
if (vec.capacity() == vec.size())
vec.reserve(vec.size() + 16); // alloc space for 16 more objects
You could choose a different container that did not result in all elements being copied upon a resize, but your bottleneck may then become the individual memory allocations for the new elements.
Upvotes: 3
Reputation: 23123
Regarding push_back() being slow and reserve being no help, the implementation of STL used in MSVC works something like this: When you first create a vector it reserves space for I think 10 elements. From then on, whenever it gets full, it reserves space for 1.5 times the number of elements in the vector. So, something like 10, 15, 22, 33, 49, 73, 105, 157... The re-allocations are expensive.
Even if you don't know the exact size, reserve() can be useful. reserve() doesn't prevent the vector from growing if it needs to. If you reserve() and the vector grows beyond that size, you have still improved things because of the reserve. If the vector turns out to be much smaller, well, maybe that's ok because the performance in general works better with smaller sizes.
You need to profile in RELEASE mode to know for sure what strategy works best.
Upvotes: 0
Reputation:
Deque has a more complex structure than vector and the speed differences between the two will be heavily dependent on both the specific implementation and the actual number of elements pushed back, but for large amounts of data it should be faster. clear() may be slower because it may choose to get rid of the more complex underlying structures. Much the same goes for list.
Upvotes: 0
Reputation: 340178
You'll need to give more information on the behavior of the routine.
In one place you're concerned about the speed of push_back()
in another you're concerned about clear()
. Are you building up the container, doing something then dumping it?
The results you see for clear()
are because vector<>
only has to release a singl block of memory, deque<>
has to release several, and list<>
has to release one for each element.
Upvotes: 0
Reputation: 34112
Are you pushing back the objects themselves, or a pointer to them? Pointers will usually be much faster as it's only 4-8 bytes to copy, compared to whatever the size of the objects are.
Upvotes: 2