Reputation: 14309
I'm optimizing a matrix numerical hotspot.
Currently, I'm doing blocking and loop unrolling to improve performance. However, I deliberately avoid peeling the borders. Instead I let the blocking steps overflow, and of course, the algorithm then touches uninitialized values.
However, the matrix is generously pre-allocated to cope with the overflow so I am not actually illegally accessing a memory location.
I don't do peeling for several reasons:
However, I am wondering whether these overflowed accesses that touch uninitialized value(s) would actually cause a performance hit?
I predictably know where the uninitialized accesses happen and they are also reported via valgrind. I have also profiled the code using Intel's VTune and could not see any signs that would point to a degraded performance due to this.
Upvotes: 3
Views: 226
Reputation: 471199
Just to get pedantic stuff out of the way:
According to the standard, bad things can happen if you use uninitialized data. (The standard allows for "trap" values that could trigger exceptions.) But for all practical purposes, this probably doesn't apply here.
If you're dealing with integers, accessing and operating on uninitialized data will have no effect on performance. (aside from division, all operations are usually fixed latency)
For floating-point, there are two problems:
Depending on the environment, signalling NaNs may trigger a hardware exception. So this would actually be a correctness issue, not just a performance issue.
It may counter-intuitive that denormal floats have anything to do with this. However, uninitialized data has a high probability of being denormalized.
And you really don't want to be messing with denormalized floating-point.
So if you're unlucky enough for the uninitialized values to have even one denormalized value, you can expect a nasty 100+ cycle penalty at the end of each loop iteration. Now depending on how large the loops are, this may or may not matter.
That said, why is uninitialized data prone to be denormalized? If the first few bits of a floating-point value are zero, then it is denormalized. It's that easy. If the data used to be a integer, or a 64-bit pointer... It'll be denormalized when reinterpreted as a floating-point value.
Suggestions:
Upvotes: 5