Reputation: 75
I have the following code
#include <chrono>
#include <iostream>
#include <vector>
int main() {
struct Point {
double x, y, z;
};
const size_t sz = 1'000'000'00;
auto start = std::chrono::steady_clock::now();
std::vector<Point> points;
points.reserve(sz);
for (size_t i = 0; i < sz; ++i) {
const double d_val = i;
points.push_back({d_val, 2.0*d_val, 3*d_val});
}
auto end = std::chrono::steady_clock::now();
std::chrono::duration<double> diff = end-start;
std::cout << "filling: " << diff.count() << std::endl;
double tot{0};
for (const auto& p : points)
tot += p.x + p.y + p.z;
diff = std::chrono::steady_clock::now()-end;
std::cout << "reading: " << diff.count() << std::end;
std::cout << tot;
return 0;
}
The results I am getting are around
filling: 1.78711
reading: 0.233211
However, if I remove points.reserve(sz); I get the results
filling: 8.38341
reading: 1.6607
I get why filling takes longer time, but why is reading so much slower?
Edit: 1. I'm using xcode, LLVM 8.0, -O3,
I have a "cout << tot" (but did not include it here), so it should not be optimized away.
If I delay all the cout's till the end (by saving the time of "filling" in a variable) I still get a difference in "reading" time.
Upvotes: 2
Views: 109
Reputation: 62054
What may be happening is that you are suffering from paging overhead. That would explain why your results are so difficult to reproduce.
Your vector has 100 million entries, which occupy about 2.4 GB of memory if we assume only 24 bytes per entry.
On the first run, (with points.reserve(sz);
) all this memory is allocated at once, and your machine probably has enough RAM to satisfy the request, so everything happens as expected down the happy path.
In the second run, (without points.reserve(sz);
) the vector begins small, and keeps growing. The implementation of vector
uses an array, so when it wants to resize it allocates a new array, copies the old data, and then frees the old array. (Instead of doing realloc()
which would, conceivably, stand some remote chance of happening in-place.) So, every time the array gets resized, more memory is needed, and the old memory we leave behind is highly fragmented, so it will not be reused unless we first run out of memory, but modern machines tend to hit the paging file before they will admit that they have ran out of memory.
So, by the time the final allocation takes place, you have probably already run out of physical memory, and your logical memory is being paged. In such a scenario, the copying of the vector during the last resize may happen under conditions of thrashing: before the copy is complete, some of the pages that are part of your vector need to be paged out in order to make room for new pages that will also be part of your vector.
Then, when you try to read the entire vector, each one of those pages needs to be paged back in once, and thus the delay.
Upvotes: 1
Reputation: 1918
This is a wild guess, but it might be because of the cache effect. When you keep adding data to the vector after reserve(), some of the data may remain in the cache (write-back caching). If you keep adding data without reserve() the data must be relocated and this relocation of the vector data could possibly be done with cache write-through copy. You could test this by warming up the cache first with one read pass and then timing the second read.
Upvotes: 1
Reputation: 69902
The code does not use the aggregate value tot
. The compiler is free to optimise away the entire for loop.
Try this instead:
int main()
{
struct Point {
double x, y, z;
};
const size_t sz = 1'000'000'00;
auto start = std::chrono::steady_clock::now();
std::vector<Point> points;
points.reserve(sz);
for (size_t i = 0; i < sz; ++i) {
const double d_val = i;
points.push_back({d_val, 2.0*d_val, 3*d_val});
}
auto end = std::chrono::steady_clock::now();
std::chrono::duration<double> diff = end-start;
std::cout << "filling: " << diff.count() << std::endl;
double tot{0};
for (const auto& p : points)
tot += p.x + p.y + p.z;
diff = std::chrono::steady_clock::now()-end;
// <== here
std::cout << tot << std::endl;
// <== here
std::cout << "reading: " << diff.count() << std::endl;
}
Upvotes: 0