underscore_d
underscore_d

Reputation: 6795

Why is a volatile local variable optimised differently from a volatile argument, and why does the optimiser generate a no-op loop from the latter?

Background

This was inspired by this question/answer and ensuing discussion in the comments: Is the definition of “volatile” this volatile, or is GCC having some standard compliancy problems?. Based on others' and my interpretation of what should happening, as discussed in comments, I've submitted it to GCC Bugzilla: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71793 Other relevant responses are still welcome.

Also, that thread has since given rise to this question: Does accessing a declared non-volatile object through a volatile reference/pointer confer volatile rules upon said accesses?

Intro

I know volatile isn't what most people think it is and is an implementation-defined nest of vipers. And I certainly don't want to use the below constructs in any real code. That said, I'm totally baffled by what's going on in these examples, so I'd really appreciate any elucidation.

My guess is this is due to either highly nuanced interpretation of the Standard or (more likely?) just corner-cases for the optimiser used. Either way, while more academic than practical, I hope this is deemed valuable to analyse, especially given how typically misunderstood volatile is. Some more data points - or perhaps more likely, points against it - must be good.

Input

Given this code:

#include <cstddef>

void f(void *const p, std::size_t n)
{
    unsigned char *y = static_cast<unsigned char *>(p);
    volatile unsigned char const x = 42;
    // N.B. Yeah, const is weird, but it doesn't change anything

    while (n--) {
        *y++ = x;
    }
}

void g(void *const p, std::size_t n, volatile unsigned char const x)
{
    unsigned char *y = static_cast<unsigned char *>(p);

    while (n--) {
        *y++ = x;
    }
}

void h(void *const p, std::size_t n, volatile unsigned char const &x)
{
    unsigned char *y = static_cast<unsigned char *>(p);

    while (n--) {
        *y++ = x;
    }
}

int main(int, char **)
{
    int y[1000];
    f(&y, sizeof y);
    volatile unsigned char const x{99};
    g(&y, sizeof y, x);
    h(&y, sizeof y, x);
}

Output

g++ from gcc (Debian 4.9.2-10) 4.9.2 (Debian stable a.k.a. Jessie) with the command line g++ -std=c++14 -O3 -S test.cpp produces the below ASM for main(). Version Debian 5.4.0-6 (current unstable) produces equivalent code, but I just happened to run the older one first, so here it is:

main:
.LFB3:
    .cfi_startproc

# f()
    movb    $42, -1(%rsp)
    movl    $4000, %eax
    .p2align 4,,10
    .p2align 3
.L21:
    subq    $1, %rax
    movzbl  -1(%rsp), %edx
    jne .L21

# x = 99
    movb    $99, -2(%rsp)
    movzbl  -2(%rsp), %eax

# g()
    movl    $4000, %eax
    .p2align 4,,10
    .p2align 3
.L22:
    subq    $1, %rax
    jne .L22

# h()
    movl    $4000, %eax
    .p2align 4,,10
    .p2align 3
.L23:
    subq    $1, %rax
    movzbl  -2(%rsp), %edx
    jne .L23

# return 0;
    xorl    %eax, %eax
    ret
    .cfi_endproc

Analysis

All 3 functions are inlined, and both that allocate volatile local variables do so on the stack for fairly obvious reasons. But that's about the only thing they share...

Adding #define volatile /**/ leads to main() being a no-op, as you'd expect. So, when present, even on a local variable volatile does do something... I just have no idea what in the case of g(). What on Earth is going on there?

Questions

Is this a weird corner case due to order of optimising analyses or such? As the code is a daft thought-experiment, I wouldn't chastise GCC for this, but it'd be good to know for sure. (Or is g() the manual timing loop people have dreamt of all these years?) If we conclude there's no Standard bearing on any of this, I'll move it to their Bugzilla just for their information.

And of course, the more important question from a practical perspective, though I don't want that to overshadow the potential for compiler geekery... Which, if any of these, are well-defined/correct according to the Standard?

Upvotes: 9

Views: 2357

Answers (1)

avdgrinten
avdgrinten

Reputation: 181

For f: GCC eliminates the non-volatile stores (but not the loads, which can have side-effects if the source location is a memory mapped hardware register). There is really nothing surprising here.

For g: Because of the x86_64 ABI the parameter x of g is allocated in a register (i.e. rdx) and does not have a location in memory. Reading a general purpose register does not have any observable side effects so the dead read gets eliminted.

Upvotes: 2

Related Questions