
Reputation: 223

volatile under the hood

I would like some help please to better understand a part of the following passage:
"The volatile keyword qualifier indicates that the variable can be changed outside of the program.For example, an external device may write data to a port. Compilers will sometimes temporarily use a cache, or register, to hold the value in a memory location for optimization purposes. If the external write modifies the memory location,then this change will not be reflected in the cached or register value." (It comes from the book: understanding and using c pointers, pg 178-179)

The ambiguity i have is between these phrases: "to hold the value in a memory location" and "If the external write modifies the memory location".

My problem is: I get the impression that if an external device writes data to port, that data will be stored to some location (???), then they will be stored to the register/cache (??) and then inside the variable of the c language source code. Something is misunderstood by me. From what i know the normal workflow should be: external device->small temporary buffer ->variable in RAM memory,(when data are going from a gadget to the MCU's RAM)

#define PORT 0xB0000000
unsigned int volatile * const port = (unsigned int*) PORT;
*port = 0x0BF4; // write to port
value = *port; // read from port

Upvotes: 5

Views: 698

Answers (3)


Reputation: 81277

Before C added the "volatile" keyboard, every access to an object that didn't have a register qualifier would result in a load from or store to at object's address. Given the declatations int i,j;, the code:


would load i and j from memory, add them, and store the result to i. It would then load i and j from memory again, add them, and store the result to j. Finally, it would load i and j from memory a third time, add them, and store the result to i. Three statements would thus result in six loads, three adds, and three stores.

If there isn't anything "special" about i and j, something like the following will be more efficient:

register int t1,t2;
t1=i; t2=j;
t1+=t2; t2+=t1; t1+=t2;
i=t1; j=t2;

While that looks like more code, operations on t1 and t2 don't require loads and stores. Thus, the compiler will only have to generate two loads, three adds, and two stores--saving the costs of four loads and a store compared with the original.

Having a compiler automatically turn the former kind of code into the latter would be helpful save for one problem: sometimes things that look like variables may be changed in ways the compiler doesn't know about. This can happen either because circuitry other than memory is wired onto the memory bus (many systems have I/O devices which are wired to respond when code tries to read or write certain addresses), or because a machine may respond to external stimuli by dispatching control to a special section of code called an interrupt handler and then resuming whatever it was doing when the interrupt handler returns. Interrupt handlers often read and write variables which may also be accessed by main-line code (indeed, that's one of the reasons they exist) but if code does something like:


and relies upon an interrupt handler setting data_received once data has become available, such code might fail if the compiler replaces it with:

t1 = data_received;

which would execute the loop "faster", but would fail to exit the loop when data arrives.

The purpose of volatile is to tell the compiler that certain objects require "special" treatment. Some compilers (the sensible ones, IMHO) will interpret volatile as an indication that accessing the object thus marked might arbitrarily affect everything in the system in ways the compiler doesn't know about, thus allowing constructs like:

extern volatile char * volatile dma_mem;
extern volatile unsigned dma_count, dma_command, dma_busy;

void put_data(char *data, unsigned size)
  dma_mem = data;
  dma_count = size;
  // Following will trigger hardware to automatically copy "dma_count"
  // bytes from memory starting at "dma_mem"; dma_busy will read as
  // zero once operation is complete.
  dma_command = OUTPUT_MEMORY; // Exact value depends on ardware

On a compiler which refrains from keeping anything in registers across a volatile access, a function like the above can be used to output data from "ordinary" memory provided that all external accesses are complete before the function returns. If a compiler keeps things in registers even across volatile accesses in the name of optimization, however, such code may fail unless the the buffer into which the data is put is also qualified volatile.

PS--while volatile can be and often is used for I/O accesses, it often shouldn't(*) be needed nearly as much for those as for things which are affected by interrupts. In many cases, I/O addresses would be defined using constructs like

#define PORTA (*(unsigned char*)0xD000)
#define PORTB (*(unsigned char*)0xD002)

and while the Standard wouldn't require a compiler to treat such addresses as volatile, many compilers would do so anyway because programmers' use of such addresses implies that they know things the compilers don't. By contrast, flags that get set by interrupt handlers look to the compiler like ordinary RAM, and it's only the volatile flag that would indicate that there's anything special about them.

(*) I've seen a lot of vendor-supplied header files which don't use volatile for I/O addresses. If a compiler will generate the same code with or without that keyword, adding more verbiage for the compiler to chew through on every build will slow down compilation for no purpose. The authors of the Standard deliberately refrained from requiring that all compilers be suitable for embedded or systems programming, and thus made no effort to forbid behaviors that would make compilers unsuitable for such purposes. Code for a particular purpose should only be expected to work on compilers that are suitable for such purpose; if such code fails on a compiler that is deliberately made less suitable for that purpose, that doesn't mean the code is "broken"--instead it means the compiler is no longer suitable for use with such code.

PS--For a compiler to make any useful optimizations based upon a constant address not being volatile, it would have to either "know" that no other object had been observed as having that same address, or else allow for the possibility that even if two integers x and y are equal, *(uint8_t*)x and *(uint8_t)y a write to one might not be recognized as affecting the other. Since the Standard says that round-tripping a pointer to an integer and back yields something which "compares equal" to the original pointer, but doesn't say it can actually be used for any purpose, that would be conforming but unexpected.

Consider, for example, the following program containing two separate translation units [assume required headers are included]

extern unsigned char foo;
extern uintptr_t volatile tfoo;

int foo_addr(void)
   tfoo = (uintptr_t)&foo;
   return tfoo == 0x12345678;

void foo_addr(void);
unsigned char foo;
uintptr_t volatile tfoo;
int main(void)
  int ok = foo_addr();
  foo = 2;
  if (ok)
    (unsigned char*)0x12345678 = 4;
  return ok + foo;

If foo happens not to have been given the address 0x12345678, no write will occur to address 0x12345678 and the code will return 0. If the address of foo is 0x12345678, however, then (unsigned char*)0x12345678 would be a legitimate pointer to foo, and a compiler should be required to recognize the access unless it decides that it doesn't feel like treating round-trip pointer-to-integer conversions as yielding usable pointers.

The easiest way, by far, to regard (unsigned char*)0x12345678 as aliasing everything it would need to alias would be to treat it asvolatileand refrain from caching in registers anything whose address has been exposed. Useful optimizations from treating such a variable as not beingvolatile` would be rare unless a compiler is willing to bend pointer semantics.

Upvotes: 1


Reputation: 71586

As stated by others, these are items that are external to the CPU core itself, it could be ram it could be a memory mapped peripheral (a uart status register lets say or a timer register, etc).

#define SOME_STATUS_REGA  (*((volatile unsigned int *)0x10008000))
void fun ( void )
    while(SOME_STATUS_REGA==0) continue;
#define SOME_STATUS_REGB  (*((unsigned int *)0x10008000))
void more_fun ( void )
    while(SOME_STATUS_REGB==0) continue;

with one target and toolchain produces

00000000 <fun>:
   0:   e59f200c    ldr r2, [pc, #12]   ; 14 <fun+0x14>
   4:   e5923000    ldr r3, [r2]
   8:   e3530000    cmp r3, #0
   c:   0afffffc    beq 4 <fun+0x4>
  10:   e12fff1e    bx  lr
  14:   10008000    andne   r8, r0, r0

00000018 <more_fun>:
  18:   e59f300c    ldr r3, [pc, #12]   ; 2c <more_fun+0x14>
  1c:   e5933000    ldr r3, [r3]
  20:   e3530000    cmp r3, #0
  24:   112fff1e    bxne    lr
  28:   eafffffe    b   28 <more_fun+0x10>
  2c:   10008000    andne   r8, r0, r0

you can see with the more_fun, not volatile case it reads the location one time does the comparison one time but goes into an infinite loop. The compiler has done what we told it to do since there is no way that variable can change there is no reason to burn clock cycles re-reading something that wont change so if it wasnt zero the first and only read it will never be zero so this falls into an infinite loop.

If you make it volatile you are "asking" the compiler to read or write it every time your code accesses it. Which you can see in the fun case, it goes back every time through the loop to read that address to see if it has changed. The volatile keyword is what made the difference between these two behaviors.

It doesnt have to be hardware that changes these values, if you use a global variable to communicate between an isr and foreground code then that variable in memory can be changed by the isr and/or by the foreground code so both need to treat it as volatile.

You also have the case of a multicore/multithreaded processor where each core/thread independently has access to shared resources. Not only do you need to use a volatile in that situation but you might need to have that ram not cached if the cores do not share the same cache and may have to have hardware and/or software locking if atomic operations are needed (ldrex/strex in the ARM world are the first step for that).


Another demonstration, the problem is not only with reads, but with writes as well. Lets say you have a peripheral that you need to write a config register to setup some mode, then you write it again to enable it with that mode. or you have a hardware interface where each write increments some logic pointer and you do a series of writes to do something.

#define SOMETHING1 (*((volatile unsigned char *)0x10002000))
void fun ( void )
#define SOMETHING2 (*((unsigned char *)0x10002000))
void more_fun ( void )

without volatile, that peripheral is not going to operate properly. The multiple writes to the same pointer/address are considered dead code and optimized out.

00000000 <fun>:
   0:   e3a02005    mov r2, #5
   4:   e3a01006    mov r1, #6
   8:   e59f300c    ldr r3, [pc, #12]   ; 1c <fun+0x1c>
   c:   e5c32000    strb    r2, [r3]
  10:   e5c32000    strb    r2, [r3]
  14:   e5c31000    strb    r1, [r3]
  18:   e12fff1e    bx  lr
  1c:   10002000    andne   r2, r0, r0

00000020 <more_fun>:
  20:   e3a02006    mov r2, #6
  24:   e59f3004    ldr r3, [pc, #4]    ; 30 <more_fun+0x10>
  28:   e5c32000    strb    r2, [r3]
  2c:   e12fff1e    bx  lr
  30:   10002000    andne   r2, r0, r0


Clang/llvm demonstrates the problem as well

#define A (*((volatile unsigned char *)0x10002000))
void afun ( void )
    A = 4;
    A = 5;
    A = 6;
    A |= 1;
    while(A==0) continue;
#define B (*((unsigned char *)0x10002000))
void bfun ( void )
    B = 4;
    B = 5;
    B = 6;
    B |= 1;
    while(B==0) continue;


00000000 <afun>:
   0:   e3a00a02    mov r0, #8192   ; 0x2000
   4:   e3a01004    mov r1, #4
   8:   e3800201    orr r0, r0, #268435456  ; 0x10000000
   c:   e5c01000    strb    r1, [r0]
  10:   e3a01005    mov r1, #5
  14:   e5c01000    strb    r1, [r0]
  18:   e3a01006    mov r1, #6
  1c:   e5c01000    strb    r1, [r0]
  20:   e5d01000    ldrb    r1, [r0]
  24:   e3811001    orr r1, r1, #1
  28:   e5c01000    strb    r1, [r0]
  2c:   e5d01000    ldrb    r1, [r0]
  30:   e3510000    cmp r1, #0
  34:   0afffffc    beq 2c <afun+0x2c>
  38:   e12fff1e    bx  lr

0000003c <bfun>:
  3c:   e3a00a02    mov r0, #8192   ; 0x2000
  40:   e3a01007    mov r1, #7
  44:   e3800201    orr r0, r0, #268435456  ; 0x10000000
  48:   e5c01000    strb    r1, [r0]
  4c:   e12fff1e    bx  lr

Adding the volatile wont hurt you if you are doing onesy twosy things that are not in a domain that can optimize them out. (a single write to each register in some sequence, a single read of a register, single also implying no loops). It will most definitely hurt you if you are doing more than one write (which often happens when configuring a peripheral) doing a read modify write (x |= something, y &= something, z ^= something, etc).

If you are using a toolchain that doesnt have an optimizer or you choose not to optimize you wont have this problem, but that code is not portable if you leave the volatiles off, you will eventually run into trouble if you dont habitually deal with variables/code that crosses compile or other similar domains (hardware is a separate compile domain from software).

Upvotes: 4


Reputation: 400069

Memory mapped I/O devices don't go through the CPU core's registers (or cache, typically). That's why they're external, they just hang somewhere on the memory bus, pretending to be memory.

So values from such a device will appear directly in what (to the CPU) looks like memory.

In the example you gave, this:

*port = 0x0BF4; // write to port

could perhaps cause an A/D converter to start a conversion, and this

value = *port; // read from port

could read in the resulting value. This is not a very typical design (A/D converters tend to be a bit more complicated than that, and so on) but it's possible.

If a compiler thought "hey, that there is just a read from a location to which this value was written" it might replace the two statements with

value = 0x0BF4; // "optimized", but broken since no more I/O occurs

This would ruin your day, if you were trying to read values from that A/D converter.

Declaring the location volatile tells the compiler to not make any assumptions about the side-effects of accesses to the location.

If you look at something like an STM32F4 ARM-based microcontroller, it has tons of memory-mapped I/O (serial ports, USB controller, Ethernet, timers, A/D and D/A converters, ... they're all there) plus a bunch of internal (to the core, but still memory-mapped) things.

Upvotes: 5

Related Questions