skywind3000
skywind3000

Reputation: 468

Is it a gcc -O2 optimization bug (different result from -O1)?

I write a very simple program, it behaves normal without -O2:

#include <stdio.h>
#include <stdint.h>

int main()
{
    uint32_t A[4] = { 1, 2, 3, 4 };
    float B[4] = { 0, 0, 0, 0 };
    float C[4] = { 5, 6, 7, 8 };
    int i;

    // convert integer A to float B
    for (i = 0; i < 4; i++) 
        B[i] = (float)A[i];

    // memory copy from B to C
    uint32_t *src = (uint32_t*)(B);
    uint32_t *dst = (uint32_t*)(C);
    dst[0] = src[0];
    dst[1] = src[1];
    dst[2] = src[2];
    dst[3] = src[3];

#if 0
    // open this to correct the error
    __asm__("":::"memory");
#endif

    // print C, C should be [1.0, 2.0, 3.0, 4.0]
    for (i = 0; i < 4; i++) 
        printf("%f\n", C[i]);

    return 0;
}

Compile without -O2:

$ gcc error.c -o error
$ ./error
1.0000
2.0000
3.0000
4.0000

It works as expected. But if I added a -O2:

$ gcc -O2 error.c -o error
$ ./error
-6169930235904.000000
0.000000
-6169804406784.000000
0.000000

In addition, if you switch #if 0 to #if 1 , it works correctly again. The asm ("":::"memory") should be unecessary in the same thread.

Is it a bug of -O2 optimization ??

Is there any thing I can tell the compiler to care of it ?? I have a function to store xmm register to a (void*) pointer, like:

inline void StoreRegister(void *ptr, const __m128& reg)
{
#if DONT_HAVE_SSE
    const uint32_t *src = reinterpret_cast<const uint32_t*>(&reg);
    uint32_t *dst = reinterpret_cast<uint32_t*>(ptr);
    dst[0] = src[0];
    dst[1] = src[1];
    dst[2] = src[2];
    dst[3] = src[3];
#else
    _mm_storeu_si128(reinterpret_cast<__m128*>(ptr), _mm_castps_si128(reg));
#endif
}

The dst is the C in the code above, any way to make it correct without modifying the function signature.

Upvotes: 2

Views: 978

Answers (1)

Bathsheba
Bathsheba

Reputation: 234635

No this is not a manifestation of a compiler bug.

Rather the behaviour of your code is undefined due to your using the result of the cast (uint32_t*)(B) &c. This is a violation of strict aliasing.

Compilers - particularly gcc - are becoming more and more aggressive when it comes to treating undefined constructs. They are allowed by the standard to assume that undefined behaviour does not occur, and can remove any branch that contains it.

Upvotes: 11

Related Questions