Reputation: 2851
I have a question I had been given a while ago during the job interview, I was wandering about the data processor cache. The question itself was connected with volatile variable, how can we not optimize the memory access for those variables. From my understanding when we read the volatile variable we need to omit the processor cache. And this is what my question is about. What is happening in such cases, is entire cache being flushed when the access for such variable is executed? Or there is some register setting that caching should be omitted for a memory region? Or is there a function for reading memory without looking in the cache? Or is it architecture dependent.
Thanks in advance for your time and answers.
Upvotes: 0
Views: 125
Reputation: 19706
There is some confusion here - the memory your program uses (through the compiler), is in fact an abstraction, maintained together by the OS and the processor. As such, you don't "need" to worry about paging, swapping, physical address space and performance.
Wait, before you jump and yell at me for talking nonesence - that was not to say you shouldn't care about them, when optimizing your code you might want to know what actually happens, so you have a set of tools to assist you (SW prefetches for example), as well as a rough idea on how the system works (cache sizes and hierarchy), allowing you to write optimized code. However, as I said, you don't have to worry about this, and if you don't - it's guaranteed to work "under the hood", to an extent. The cache for example, is guaranteed to maintain coherency even when working with shared data (that's maintained through a set of pretty complicated HW protocols), and even in cases of virtual address aliases (multiple virt addresses pointing to the same physical one). But here comes the "to an extent" part - in some cases you have to make sure you use it correctly. If you want to do memory-mapped IO for e.g., you should define it properly so that the processor knows it shouldn't be cached. The compiler isn't likely to do this for you implicitly, it probably won't even know.
Now, volatile
lives in an upper level, it's part of the contract between the programmer and his compiler. It means the compiler isn't allowed to do all sorts of optimizations with this variable, that would be unsafe for the program even within the memory model abstraction. These are basically cases where the value can be modified externally at any point (through interrupt, mmio, other threads, ...). Keep in mind that the compiler still lives above the memory abstraction, if it decides to write something to memory or read it, aside from possible hints it relies completely on the processor to do whatever it needs to make this chunk of memory close at hand while maintaining correctness. However, a compiler is allowed much more freedom than the HW - it could decide to move reads/writes or eliminate variables alltogether, something which the CPU in most cases isn't allowed to, so you need to prevent that from happening if it's unsafe. Some nice examples of when that happens can be found here - http://www.barrgroup.com/Embedded-Systems/How-To/C-Volatile-Keyword
So while a volatile hint limits the freedom of the compiler inside the memory model, it doesn't necessarily limits the underlying HW. You probably don't want it to - say you have a volatile variable that you want to expose to other threads - if the compiler made it uncacheable it would ruin the performance (and without need). If on top of that you also want to protect the memory model from unsafe caching (which are just a subset of the cases volatile might come in handy), you'll have to do so explicitly.
EDIT: I felt bad for not adding any example, so to make it clearer - consider the following code:
int main() {
int n = 20;
int sum = 0;
int x = 1;
/*volatile */ int* px = &x;
while (sum < n) {
sum+= *px;
printf("%d\n", sum);
}
return 0;
}
This would count from 1 to 20 in jumps of x, which is 1. Let's see how gcc -O3
writes it:
0000000000400440 <main>:
400440: 53 push %rbx
400441: 31 db xor %ebx,%ebx
400443: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
400448: 83 c3 01 add $0x1,%ebx
40044b: 31 c0 xor %eax,%eax
40044d: be 3c 06 40 00 mov $0x40063c,%esi
400452: 89 da mov %ebx,%edx
400454: bf 01 00 00 00 mov $0x1,%edi
400459: e8 d2 ff ff ff callq 400430 <__printf_chk@plt>
40045e: 83 fb 14 cmp $0x14,%ebx
400461: 75 e5 jne 400448 <main+0x8>
400463: 31 c0 xor %eax,%eax
400465: 5b pop %rbx
400466: c3 retq
note the add $0x1,%ebx
- since the variable is considered "safe" enough by the compiler (volatile is commented out here), it allows itself to consider it as loop invariant. In fact, if I had not printed something on each iteration, the entire loop would have been optimized away since gcc can tell the final outcome pretty easily.
However, uncommenting the volatile keyword, we get -
0000000000400440 <main>:
400440: 53 push %rbx
400441: 31 db xor %ebx,%ebx
400443: 48 83 ec 10 sub $0x10,%rsp
400447: c7 04 24 01 00 00 00 movl $0x1,(%rsp)
40044e: 66 90 xchg %ax,%ax
400450: 8b 04 24 mov (%rsp),%eax
400453: be 4c 06 40 00 mov $0x40064c,%esi
400458: bf 01 00 00 00 mov $0x1,%edi
40045d: 01 c3 add %eax,%ebx
40045f: 31 c0 xor %eax,%eax
400461: 89 da mov %ebx,%edx
400463: e8 c8 ff ff ff callq 400430 <__printf_chk@plt>
400468: 83 fb 13 cmp $0x13,%ebx
40046b: 7e e3 jle 400450 <main+0x10>
40046d: 48 83 c4 10 add $0x10,%rsp
400471: 31 c0 xor %eax,%eax
400473: 5b pop %rbx
400474: c3 retq
400475: 90 nop
now the add operand is being read from the stack, as the compilers is led to suspect someone might change it. It's still caches, and as a normal writeback-typed memory it would catch any attempt to modify it from another thread or DMA, and the memory system would provide the new value (most likely the cache line would be snooped and invalidated, forcing the CPU to fetch the new value from whichever core owns it now). However, as I said, if x should not have been a normal cacheable memory address, but rather ment to be some MMIO or something else that might change silently beneath the memory system - then the cached value would be wrong (that's why MMIO shouldn't be cached), and the compiler would never know that even though it's considered volatile.
By the way - using volatile int x
and adding it directly would produce the same result. Then again - making x or px global variables would also do that, the reason being - the compiler would suspect that someone might have access to it, and therefore would take the same precautions as with an explicit volatile hint. Interestingly enuogh, the same goes for making x local, but copying its address into a global pointer (but still using x directly in the main loop). The compiler is quite cautious.
That is not to say it's 100% full proof, you could in theory keep x local, have the compiler do the optimizations, and then "guess" the address somewhere from the outside (another thread for e.g.). This is when volatile does come in handy.
Upvotes: 2
Reputation: 12658
volatile variable, how can we not optimize the memory access for those variables.
Yes, Volatile
on variable tells the compiler that the variable can be read or write in such a way that programmer can foresee what could happen to this variable out of programs scope and cannot seen by the compiler. This means that compiler cannot perform optimizations on the variable which will alter the intended functionality, caching its value in a register to avoid memory access using the register copy during each iteration.
`entire cache being flushed when the access for such variable is executed?`
No. Ideally compiler access variable from the variable's storage location which doesn't flush the existing cache entries between CPU and memory.
Or there is some register setting that caching should be omitted for a memory region?
Apparently when the register is in un-chached memory space, accessing that memory variable will give you the up-to-date value than from cache memory. Again this should be architecture dependent.
Upvotes: 0