Reputation: 150
In a certain library (FFTW: discrete Fourier transform computation), I came across a header file which contains the following comment and some #defines following that. The comment talks about some programming trick. But I'm not able to understand what exactly this programming trick is. Could someone please explain ?
/* hackery to prevent the compiler from ``optimizing'' induction
variables in codelet loops. The problem is that for each K and for
each expression of the form P[I + STRIDE * K] in a loop, most
compilers will try to lift an induction variable PK := &P[I + STRIDE * K].
For large values of K this behavior overflows the
register set, which is likely worse than doing the index computation
in the first place.
If we guess that there are more than
ESTIMATED_AVAILABLE_INDEX_REGISTERS such pointers, we deliberately confuse
the compiler by setting STRIDE ^= ZERO, where ZERO is a value guaranteed to
be 0, but the compiler does not know this.
16 registers ought to be enough for anybody, or so the amd64 and ARM ISA's
seem to imply.
*/
#define ESTIMATED_AVAILABLE_INDEX_REGISTERS 16
#define MAKE_VOLATILE_STRIDE(nptr, x) \
(nptr <= ESTIMATED_AVAILABLE_INDEX_REGISTERS ? \
0 : \
((x) = (x) ^ X(an_INT_guaranteed_to_be_zero)))
#endif /* PRECOMPUTE_ARRAY_INDICES */
Upvotes: 2
Views: 50
Reputation: 16125
The optimization: Instead of recalculating the index of the array every time an iteration in the loop occurs, some compilers anticipate the next addresses and place these in registers because the indexing expression is predictable.
The problem: Some indexing expressions (like I + STRIDE * K
) may result in using a lot of registers this way, and if this number exceeds the total amount of registers, some register values will be pushed to stack memory, including other variables that the loop might be using.
The trick: In order to force a compiler to not use this optimization, an external integer is used. Adding or XOR'ing this zero and storing it in x
is a no-op that "taints" the stride, and consequently the index expression, making it unpredictable by the optimization analysis. It can no longer infer how this variable behaves even though we know it to behave very zero-like. A relevant extract of the file ifftw.h from which this is derived:
extern const INT X(an_INT_guaranteed_to_be_zero);
#ifdef PRECOMPUTE_ARRAY_INDICES
...
#define MAKE_VOLATILE_STRIDE(nptr, x) (x) = (x) + X(an_INT_guaranteed_to_be_zero)
#else
...
#define ESTIMATED_AVAILABLE_INDEX_REGISTERS 16
#define MAKE_VOLATILE_STRIDE(nptr, x) \
(nptr <= ESTIMATED_AVAILABLE_INDEX_REGISTERS ? \
0 : \
((x) = (x) ^ X(an_INT_guaranteed_to_be_zero)))
#endif /* PRECOMPUTE_ARRAY_INDICES */
This optimization is either attempted avoided completely, or allowed on the condition that the indices can fit into a guess at the number of available registers. The way it allows the optimization is by using a constant zero.
Some etymology: The macro MAKE_VOLATILE_STRIDE
derives its name from the volatile keyword which indicates that a value may change between different accesses, even if it does not appear to be modified. This keyword prevents an optimizing compiler from optimizing away subsequent reads or writes and thus incorrectly reusing a stale value or omitting writes. (Wikipedia)
Why the volatile keyword, rather than XOR'ing an external value, is not sufficient, I don't know.
Upvotes: 1