Reputation: 1289
I have a relatively small C function.
#define ROTL16(x, r) (((x) << (r)) | (x >> (16 - (r))))
void a_inverse(uint16_t *left, uint16_t *right) {
*right ^= *left;
*right = ROTL16(*right, 14);
*left -= *right;
*left = ROTL16(*left, 7);
}
The function was called by submitting uint16_t
pointers that point to low and hight 16-bit of a single uint32_t
integer.
uint32_t whole;
uint16_t * left = (uint16_t *)&whole;
uint16_t * right = left + 1;
a_inverse(left, right);
But the function does not perform as expected. So I blindly add the volatile keyword.
#define ROTL16(x, r) (((x) << (r)) | (x >> (16 - (r))))
void a_inverse(volatile uint16_t *left, volatile uint16_t *right) {
*right ^= *left;
*right = ROTL16(*right, 14);
*left -= *right;
*left = ROTL16(*left, 7);
}
This time it works correctly.
Comparing the output assembly with -O3 I see nothing wrong
This is a non-volatile version
a_inverse:
.cfi_startproc
endbr64
movzwl (%rsi), %ea
xorw (%rdi), %ax
rorw $2, %ax
movw %ax, (%rsi)
movzwl (%rdi), %edx
subl %eax, %edx
movl %edx, %eax
rolw $7, %ax
movw %ax, (%rdi)
ret
.cfi_endproc
And this is the one with volatile
keyword
a_inverse:
.cfi_startproc
endbr64
movzwl (%rdi), %edx
movzwl (%rsi), %eax
xorl %edx, %eax
movw %ax, (%rsi)
movzwl (%rsi), %eax
movzwl (%rsi), %edx
sall $14, %eax
shrw $2, %dx
orl %edx, %eax
movw %ax, (%rsi)
movzwl (%rsi), %edx
movzwl (%rdi), %eax
subl %edx, %eax
movw %ax, (%rdi)
movzwl (%rdi), %eax
movzwl (%rdi), %edx
sall $7, %eax
shrw $9, %dx
orl %edx, %eax
movw %ax, (%rdi)
ret
.cfi_endproc
And I don't understand why do the functions behave differently.
Could you help me understand what the reason might be?
The only difference in expected/unexpected results is those volatile
usages.
Removing either one of the volatile
would result in an unexpected result.
Upvotes: 0
Views: 121
Reputation: 180715
The main problem is unlikely to be in function a_inverse()
itself, and the fact that volatile
-qualifying the arguments changed the behavior is a happenstance if the data are not actually volatile.
In all likelihood, the locus of the problem is actually in the caller. Indeed, in comments, you describe computing the argument pointers something like this:
uint32_t *whole = ...;
uint16_t *left = (uint16_t *) whole;
uint16_t *right = left + 1;
which would presumably be followed up by a call such as
a_inverse(left, right);
. The cast to type uint16_t *
is permitted. The permissibility of the pointer arithmetic is debatable, but in practice it is unlikely to be an issue. But because the resulting left
and right
pointers don't actually point to uint16_t
objects, attempting to use them to access the object to which they point violates the strict aliasing rule (C18, paragraph 6.5/7), and therefore produces undefined behavior.
The strict aliasing rule says that an object can be accessed only via an lvalue of compatible type +/- qualification and signedness differences, possibly in a union, or via an lvalue of character type. (Among the key implications of the latter is that you may access any object's representation via a char *
.) This allows compilers to use data-type analysis to inform optimization choices.
GCC has a flag, -fno-strict-aliasing
, that tells it not to use type analysis to make optimization decisions -- basically, forcing it to assume that any pointer might alias any other. If you don't want to fix the issue then using that flag would likely mitigate it.
On the other hand, I suspect that all the pointer (mis)use is a misguided attempt at optimization. I would rewrite the function without:
uint32_t a_inverse(uint32_t whole) {
uint16_t left = whole; // x86 is little-endian
uint16_t right = whole >> 16;
right ^= left;
right = ROTL16(right, 14);
left -= right;
left = ROTL16(left, 7);
return right << 16 + left;
}
... and expect it to be at least as fast, not least because there are no aliasing risks to limit available optimizations.
Upvotes: 4