Reputation: 21160
I'm pretty new to C++ and recently I ran across some info on what it means for a variable to be volatile
. As far as I understood, it means a read or write to the variable can never be optimized out of existence.
However a weird situation arises when I declare a volatile
variable that isn't 1, 2, 4, 8 bytes large: the compiler(gnu with C++11 enabled) seemingly ignores the volatile
specifier
#define expand1 a, a, a, a, a, a, a, a, a, a
#define expand2 // ten expand1 here, expand3 to expand5 follows
// expand5 is the equivalent of 1e+005 a, a, ....
struct threeBytes { char x, y, z; };
struct fourBytes { char w, x, y, z; };
int main()
{
// requires ~1.5sec
foo<int>();
// doesn't take time
foo<threeBytes>();
// requires ~1.5sec
foo<fourBytes>();
}
template<typename T>
void foo()
{
volatile T a;
// With my setup, the loop does take time and isn't optimized out
clock_t start = clock();
for(int i = 0; i < 100000; i++);
clock_t end = clock();
int interval = end - start;
start = clock();
for(int i = 0; i < 100000; i++) expand5;
end = clock();
cout << end - start - interval << endl;
}
Their timings are
foo<int>()
: ~1.5sfoo<threeBytes>()
: 0I've tested it with different variables (user-defined or not) that is 1 to 8 bytes and only 1, 2, 4, 8 takes time to run. Is this a bug only existing with my setup or is volatile
a request to the compiler and not something absolute?
PS the four byte versions always take half the time as others and is also a source of confusion
Upvotes: 1
Views: 901
Reputation: 137395
This question is a lot more interesting than it first appears (for some definition of "interesting"). It looks like you've found a compiler bug (or intentional nonconformance), but it isn't quite the one you are expecting.
According to the standard, one of your foo
calls has undefined behavior, and the other two are ill-formed. I'll first explain what should happen; the relevant standard quotes can be found after the break. For our purposes, we can just analyze the simple expression statement a, a, a;
given volatile T a;
.
a, a, a
in this expression statement is a discarded-value expression ([stmt.expr]/p1). The type of the expression a, a, a
is the type of the right operand, which is the id-expression a
, or volatile T
; since a
is an lvalue, so is the expression a, a, a
([expr.comma]/p1). Thus, this expression is an lvalue of a volatile-qualified type, and it is a "comma expression where the right operand is one of these expressions" - in particular, an id-expression - and therefore [expr]/p11 requires the lvalue-to-rvalue conversion be applied to the expression a, a, a
. Similarly, inside a, a, a
, the left expression a, a
is also a discarded-value expression, and inside this expression the left expression a
is also a discarded-value expression; similar logic shows that [expr]/p11 requires the lvalue-to-rvalue conversion be applied to both the result of the expression a, a
and the result of the expression a
(the leftmost one).
If T
is a class type (either threeBytes
or fourBytes
), applying the lvalue-to-rvalue conversion entails creating a temporary by copy-initialization from the volatile lvalue a
([conv.lval]/p2). However, the implicitly declared copy constructor always takes its argument by a non-volatile reference ([class.copy]/p8); such a reference cannot bind to a volatile object. Therefore, the program is ill-formed.
If T
is int
, then applying the lvalue-to-rvalue conversion yields the value contained in a
. However, in your code, a
is never initialized; this evaluation therefore produces an indeterminate value, and per [dcl.init]/p12, results in undefined behavior.
Standard quotes follows. All are from C++14:
[expr]/p11:
In some contexts, an expression only appears for its side effects. Such an expression is called a discarded-value expression. The expression is evaluated and its value is discarded. The array-to-pointer (4.2) and function-to- pointer (4.3) standard conversions are not applied. The lvalue-to-rvalue conversion (4.1) is applied if and only if the expression is a glvalue of volatile-qualified type and it is one of the following:
- ( expression ), where expression is one of these expressions,
- id-expression (5.1.1),
- [several inapplicable bullets omitted], or
- comma expression (5.18) where the right operand is one of these expressions.
[ Note: Using an overloaded operator causes a function call; the above covers only operators with built-in meaning. If the lvalue is of class type, it must have a volatile copy constructor to initialize the temporary that is the result of the lvalue-to-rvalue conversion. —end note ]
[expr.comma]/p1:
A pair of expressions separated by a comma is evaluated left-to-right; the left expression is a discarded-value expression (Clause 5) [...] The type and value of the result are the type and value of the right operand; the result is of the same value category as its right operand [...].
[stmt.expr]/p1:
Expression statements have the form
expression-statement: expression_opt;
The expression is a discarded-value expression (Clause 5).
[conv.lval]/p1-2:
1 A glvalue (3.10) of a non-function, non-array type
T
can be converted to a prvalue. IfT
is an incomplete type, a program that necessitates this conversion is ill-formed. IfT
is a non-class type, the type of the prvalue is the cv-unqualified version ofT
. Otherwise, the type of the prvalue is T.2 [some special rules not relevant here] In all other cases, the result of the conversion is determined according to the following rules:
- [inapplicable bullet omitted]
- Otherwise, if
T
has a class type, the conversion copy-initializes a temporary of typeT
from the glvalue and the result of the conversion is a prvalue for the temporary.- [inapplicable bullet omitted]
- Otherwise, the value contained in the object indicated by the glvalue is the prvalue result.
[dcl.init]/p12:
If no initializer is specified for an object, the object is default-initialized. When storage for an object with automatic or dynamic storage duration is obtained, the object has an indeterminate value, and if no initialization is performed for the object, that object retains an indeterminate value until that value is replaced (5.17). [...] If an indeterminate value is produced by an evaluation, the behavior is undefined except in the following cases: [certain inapplicable exceptions related to unsigned narrow character types]
[class.copy]/p8:
The implicitly-declared copy constructor for a class
X
will have the formX::X(const X&)
if each potentially constructed subobject of a class type
M
(or array thereof) has a copy constructor whose first parameter is of typeconst M&
orconst volatile M&
. Otherwise, the implicitly-declared copy constructor will have the formX::X(X&)
Upvotes: 4
Reputation: 56577
The struct version will be optimized out probably, as the compiler realizes that there's no side effects (no read or write into the variable a
), regardless of the volatile
. You basically have a no-op, a;
, so the compiler can do whatever it pleases it; it is not forced to unroll the loop or to optimize it out, so the volatile
doesn't really matter here. In the case of int
s, there seems to be no optimizations, but this is consistent with the use case of volatile
: you should expect non-optimizations only when you have a possible "access to an object" (i.e. read or write) in the loop. However what constitutes "access to an object" is implementation-defined (although most of the time it follows common-sense), see EDIT 3 at the bottom.
Toy example here:
#include <iostream>
#include <chrono>
int main()
{
volatile int a = 0;
const std::size_t N = 100000000;
// side effects, never optimized
auto start = std::chrono::steady_clock::now();
for (std::size_t i = 0 ; i < N; ++i)
++a; // side effect (write)
auto end = std::chrono::steady_clock::now();
std::cout << std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count()
<< " ms" << std::endl;
// no side effects, may or may not be optimized out
start = std::chrono::steady_clock::now();
for (std::size_t i = 0 ; i < N; ++i)
a; // no side effect, this is a no-op
end = std::chrono::steady_clock::now();
std::cout << std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count()
<< " ms" << std::endl;
}
EDIT
The no-op is not actually optimized out for scalar types, as you can see in this minimal example. For struct
's though, it is optimized out. In the example I linked, clang
doesn't optimize the code with no optimization, but optimizes both loops with -O3
. gcc
doesn't optimize out the loops either with no optimizations, but optimizes only the first loop with optimizations on.
EDIT 2
clang
spits out an warning: warning: expression result unused; assign into a variable to force a volatile load [-Wunused-volatile-lvalue]
. So my initial guess was correct, the compiler can optimize out no-ops, but it is not forced. Why does it do it for struct
s and not scalar types is something that I don't understand, but it is the compiler's choice, and it is standard compliant. For some reason it gives this warning only when the no-op is a struct
, and doesn't give the warning when it's a scalar type.
Also note that you don't have a "read/write", you only have a no-op, so you shouldn't expect anything from volatile
.
EDIT 3
From the golden book (C++ standard)
7.1.6.1/8 The cv-qualifiers [dcl.type.cv]
What constitutes an access to an object that has volatile-qualified type is implementation-defined. ...
So it is up to the compiler to decide when to optimize out the loops. In most cases, it follows the common sense: when reading or writing into the object.
Upvotes: 5
Reputation: 1
volatile
doesn't do what you think it does.
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2006/n2016.html
If you're relying on volatile
outside of the three very specific uses Boehm mentions on the page I linked, you're going to get unexpected results.
Upvotes: 0