Reputation: 3839
I am writing a small renderer (based on the rasterisation algorithm). It's a personal project I am doing to test different techniques. I was measuring the time it took to render a bunch of triangles, and while doing this I noticed something strange. What the program does is write to an image buffer (a 1D array of Vec3ui) if a given pixel overlaps a 2D triangle and pass some other test (it writes in the buffer the color of that triangle).
Vec3<unsigned char> *fb = new Vec3<unsigned char>[w * h];
...
void rasterize(
...,
Vec3<unsigned char> *&fb,
float *&zbuffer)
{
Vec3<unsigned char> randcol(drand48() * 255, drand48() * 255, drand48() * 255);
...
uint32_t x, y;
// loop over bounding box of triangle
// check if given pixel is in triangle
for (y = ymin, p.y = ymin; y <= ymax; ++y, ++p.y)
{
for (x = xmin, p.x = xmin; x <= xmax; ++x, ++p.x)
{
if (pixelOverTriangle(...) {
fb[y * w + x] = randcol;
}
}
}
}
Where I measured the stat, I thought that would actually take the longest in the process is rendering the triangles, doing all the test etc. It happens that when I run the program with a given number of triangles I get the following render time:
74 ms
But when I comment out the line where I write to the image buffer I get:
5 ms
So to be clear I do:
if (pixelOverTriangle(...) {
// fb[y * w + x] = randcol;
}
In fact more than 90% of the time is spent writing to the image buffer!
I have to say that I tried optimising how the index used to access elements in the array is computed, but this not where the time goes. The times goes into actually copying the variable to the right into the buffer (so it seems anyway).
I am very surprised by these numbers.
So I have a few questions:
Upvotes: 0
Views: 132
Reputation: 316
A lot more goes into a memory read / write than C++ makes it seem. More often than not, your processor caches blocks of memory for quick access; this vastly improves performance for data in contiguous memory: arrays, structs, and the stack for example. However, upon trying to access memory that has not been cached (a cache miss) the processor has to cache a new block of memory, which takes significantly longer (minutes or even hours scaled to a second-long cycle). By accessing arbitrary segments of a long block of memory – like your image – you are practically guaranteeing continuous cache misses.
To make matters worse, computer memory (RAM) actually lies on virtual pages that are swapped in and out of the physical memory all the time. If your image is big enough to lie across multiple memory pages (usually around 4kb each) then your operating system is actually loading and unloading data from secondary storage (your hard drive), which you can imagine taking much longer than a direct read from memory.
I found an article from another stackoverflow question about cache performance that might answer your question better than me. Really, it's just important to be aware of what a memory read/write is actually doing, and how that can drastically affect performance.
Upvotes: 2
Reputation: 54383
A possible answer which you'll have to check out...
The compiler might notice that your code does nothing and remove it. Look at the disassembly of the function and see if it is actually doing any calculations.
Upvotes: 1