Passer By
Passer By

Reputation: 21160

Volatile specifier ignored in C++

I'm pretty new to C++ and recently I ran across some info on what it means for a variable to be volatile. As far as I understood, it means a read or write to the variable can never be optimized out of existence.

However a weird situation arises when I declare a volatile variable that isn't 1, 2, 4, 8 bytes large: the compiler(gnu with C++11 enabled) seemingly ignores the volatile specifier

#define expand1 a, a, a, a, a, a, a, a, a, a
#define expand2 // ten expand1 here, expand3 to expand5 follows
// expand5 is the equivalent of 1e+005 a, a, ....

struct threeBytes { char x, y, z; };
struct fourBytes { char w, x, y, z; };

int main()
{
   // requires ~1.5sec
   foo<int>();

   // doesn't take time
   foo<threeBytes>();

   // requires ~1.5sec
   foo<fourBytes>();
}

template<typename T>
void foo()
{
   volatile T a;

   // With my setup, the loop does take time and isn't optimized out
   clock_t start = clock();
   for(int i = 0; i < 100000; i++);
   clock_t end = clock();
   int interval = end - start;

   start = clock();
   for(int i = 0; i < 100000; i++) expand5;
   end = clock();

   cout << end - start - interval << endl;
}

Their timings are

I've tested it with different variables (user-defined or not) that is 1 to 8 bytes and only 1, 2, 4, 8 takes time to run. Is this a bug only existing with my setup or is volatile a request to the compiler and not something absolute?

PS the four byte versions always take half the time as others and is also a source of confusion

Upvotes: 1

Views: 901

Answers (3)

T.C.
T.C.

Reputation: 137395

This question is a lot more interesting than it first appears (for some definition of "interesting"). It looks like you've found a compiler bug (or intentional nonconformance), but it isn't quite the one you are expecting.

According to the standard, one of your foo calls has undefined behavior, and the other two are ill-formed. I'll first explain what should happen; the relevant standard quotes can be found after the break. For our purposes, we can just analyze the simple expression statement a, a, a; given volatile T a;.

a, a, a in this expression statement is a discarded-value expression ([stmt.expr]/p1). The type of the expression a, a, a is the type of the right operand, which is the id-expression a, or volatile T; since a is an lvalue, so is the expression a, a, a ([expr.comma]/p1). Thus, this expression is an lvalue of a volatile-qualified type, and it is a "comma expression where the right operand is one of these expressions" - in particular, an id-expression - and therefore [expr]/p11 requires the lvalue-to-rvalue conversion be applied to the expression a, a, a. Similarly, inside a, a, a, the left expression a, a is also a discarded-value expression, and inside this expression the left expression a is also a discarded-value expression; similar logic shows that [expr]/p11 requires the lvalue-to-rvalue conversion be applied to both the result of the expression a, a and the result of the expression a (the leftmost one).

If T is a class type (either threeBytes or fourBytes), applying the lvalue-to-rvalue conversion entails creating a temporary by copy-initialization from the volatile lvalue a ([conv.lval]/p2). However, the implicitly declared copy constructor always takes its argument by a non-volatile reference ([class.copy]/p8); such a reference cannot bind to a volatile object. Therefore, the program is ill-formed.

If T is int, then applying the lvalue-to-rvalue conversion yields the value contained in a. However, in your code, a is never initialized; this evaluation therefore produces an indeterminate value, and per [dcl.init]/p12, results in undefined behavior.


Standard quotes follows. All are from C++14:

[expr]/p11:

In some contexts, an expression only appears for its side effects. Such an expression is called a discarded-value expression. The expression is evaluated and its value is discarded. The array-to-pointer (4.2) and function-to- pointer (4.3) standard conversions are not applied. The lvalue-to-rvalue conversion (4.1) is applied if and only if the expression is a glvalue of volatile-qualified type and it is one of the following:

  • ( expression ), where expression is one of these expressions,
  • id-expression (5.1.1),
  • [several inapplicable bullets omitted], or
  • comma expression (5.18) where the right operand is one of these expressions.

[ Note: Using an overloaded operator causes a function call; the above covers only operators with built-in meaning. If the lvalue is of class type, it must have a volatile copy constructor to initialize the temporary that is the result of the lvalue-to-rvalue conversion. —end note ]

[expr.comma]/p1:

A pair of expressions separated by a comma is evaluated left-to-right; the left expression is a discarded-value expression (Clause 5) [...] The type and value of the result are the type and value of the right operand; the result is of the same value category as its right operand [...].

[stmt.expr]/p1:

Expression statements have the form

expression-statement:
    expression_opt;

The expression is a discarded-value expression (Clause 5).

[conv.lval]/p1-2:

1 A glvalue (3.10) of a non-function, non-array type T can be converted to a prvalue. If T is an incomplete type, a program that necessitates this conversion is ill-formed. If T is a non-class type, the type of the prvalue is the cv-unqualified version of T. Otherwise, the type of the prvalue is T.

2 [some special rules not relevant here] In all other cases, the result of the conversion is determined according to the following rules:

  • [inapplicable bullet omitted]
  • Otherwise, if T has a class type, the conversion copy-initializes a temporary of type T from the glvalue and the result of the conversion is a prvalue for the temporary.
  • [inapplicable bullet omitted]
  • Otherwise, the value contained in the object indicated by the glvalue is the prvalue result.

[dcl.init]/p12:

If no initializer is specified for an object, the object is default-initialized. When storage for an object with automatic or dynamic storage duration is obtained, the object has an indeterminate value, and if no initialization is performed for the object, that object retains an indeterminate value until that value is replaced (5.17). [...] If an indeterminate value is produced by an evaluation, the behavior is undefined except in the following cases: [certain inapplicable exceptions related to unsigned narrow character types]

[class.copy]/p8:

The implicitly-declared copy constructor for a class X will have the form

X::X(const X&)

if each potentially constructed subobject of a class type M (or array thereof) has a copy constructor whose first parameter is of type const M& or const volatile M&. Otherwise, the implicitly-declared copy constructor will have the form

X::X(X&)

Upvotes: 4

vsoftco
vsoftco

Reputation: 56577

The struct version will be optimized out probably, as the compiler realizes that there's no side effects (no read or write into the variable a), regardless of the volatile. You basically have a no-op, a;, so the compiler can do whatever it pleases it; it is not forced to unroll the loop or to optimize it out, so the volatile doesn't really matter here. In the case of ints, there seems to be no optimizations, but this is consistent with the use case of volatile: you should expect non-optimizations only when you have a possible "access to an object" (i.e. read or write) in the loop. However what constitutes "access to an object" is implementation-defined (although most of the time it follows common-sense), see EDIT 3 at the bottom.

Toy example here:

#include <iostream>
#include <chrono>

int main()
{
    volatile int a = 0;

    const std::size_t N = 100000000;

    // side effects, never optimized
    auto start = std::chrono::steady_clock::now();
    for (std::size_t i = 0 ; i < N; ++i)
        ++a; // side effect (write)
    auto end = std::chrono::steady_clock::now();
    std::cout << std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count()
              <<  " ms" << std::endl;

    // no side effects, may or may not be optimized out
    start = std::chrono::steady_clock::now();
    for (std::size_t i = 0 ; i < N; ++i)
        a; // no side effect, this is a no-op
    end = std::chrono::steady_clock::now();
    std::cout << std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count()
              <<  " ms" << std::endl;
}

EDIT

The no-op is not actually optimized out for scalar types, as you can see in this minimal example. For struct's though, it is optimized out. In the example I linked, clang doesn't optimize the code with no optimization, but optimizes both loops with -O3. gcc doesn't optimize out the loops either with no optimizations, but optimizes only the first loop with optimizations on.

EDIT 2

clang spits out an warning: warning: expression result unused; assign into a variable to force a volatile load [-Wunused-volatile-lvalue]. So my initial guess was correct, the compiler can optimize out no-ops, but it is not forced. Why does it do it for structs and not scalar types is something that I don't understand, but it is the compiler's choice, and it is standard compliant. For some reason it gives this warning only when the no-op is a struct, and doesn't give the warning when it's a scalar type.

Also note that you don't have a "read/write", you only have a no-op, so you shouldn't expect anything from volatile.

EDIT 3

From the golden book (C++ standard)

7.1.6.1/8 The cv-qualifiers [dcl.type.cv]

What constitutes an access to an object that has volatile-qualified type is implementation-defined. ...

So it is up to the compiler to decide when to optimize out the loops. In most cases, it follows the common sense: when reading or writing into the object.

Upvotes: 5

Andrew Henle
Andrew Henle

Reputation: 1

volatile doesn't do what you think it does.

http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2006/n2016.html

If you're relying on volatile outside of the three very specific uses Boehm mentions on the page I linked, you're going to get unexpected results.

Upvotes: 0

Related Questions