Reputation: 685
Is an empty line of code that ends with a semicolon equivelent to an asm("nop") instruction?
volatile int x = 5;
if(x == 5){
printf("x has not been changed yet\n");
}
else{
;//Is this the same as asm("nop") or __asm nop in windows?
//alternatively could use __asm nop or __nop();
}
I looked at this answer and it makes me not want to use an x86 specific implementation of using inline assembly. Is `__asm nop` the Windows equivalent of `asm volatile("nop");` from GCC compiler
I can use this void __nop(); function that msdn seems to recommend, but I don't want to drag in the library if I don't have to. https://learn.microsoft.com/en-us/cpp/intrinsics/nop?view=vs-2017
Is there a cheap, portable way to add a nop instruction that won't get compiled out? I thought an empty semicolon either was nop or compiled out but I can't find any info on it tonight for some reason.
CLARIFICATION EDIT I can use inline asm to do this for x86 but I would like it to be portable. I can use the windows library __nop() but I don't want to import the library into my project, its undesirable overhead.
I am looking for a cleaver way to generate a NOP instruction that will not be optimized out (with standard C syntax preferably) that can be made into a MACRO and used throughout a project, having minimal overhead and works (or can easy be improved to work) on windows/linux/x86/x64.
Thanks.
Upvotes: 1
Views: 3344
Reputation: 18523
I mean i don't want to add a library just to force the compiler to add a NOP.
... in a way that is independent of compiler settings (such as optimization settings) and in a way that works with all Visual C++ versions (and maybe even other compilers):
No chance: A compiler is free on how it is generating code as long as the assembler code has the behavior the C code is describing.
And because the NOP
instruction does not change the behavior of the program, the compiler is free to add it or to leave it out.
Even if you found a way to force the compiler to generate a NOP
: One update of the compiler or a Windows update modifying some file and the compiler might not generate the NOP
instruction any longer.
I can use inline asm to do this for x86 but I would like it to be portable.
As I wrote above, any way to force the compiler to write a NOP
would only work on a certain compiler version for a certain CPU.
Using inline assembly or __nop()
you might cover all compilers of a certain manufacturer (for example: all GNU C compilers or all variants of Visual C++ etc...).
Another question would be: Do you explicitly need the "official" NOP
instruction or can you live with any instruction that does nothing?
If you could live with any instruction doing (nearly) nothing, reading a global or static volatile
variable could be a replacement for NOP
:
static volatile char dummy;
...
else
{
(void)dummy;
}
This should force the compiler to add a MOV
instruction reading the variable dummy
.
Background:
If you wrote a device driver, you could link the variable dummy
to some location where reading the variable has "side-effects". Example: Reading a variable located in VGA video memory can cause influence the screen content!
Using the volatile
keyword you do not only tell the compiler that the value of the variable may change at any time, but also that reading the variable may have such effects.
This means that the compiler has to assume that not reading the variable causes the program not to work correctly. It cannot optimize away the (actually unnecessary) MOV
instruction reading the variable.
Upvotes: 4
Reputation: 365237
Is an empty line of code that ends with a semicolon equivelent to an asm("nop") instruction?
No, of course not. You could have trivially tried it yourself. (On your own machine, or on the Godbolt compiler explorer, https://godbolt.org/)
You wouldn't want innocent CPP macros to introduce a NOP if FOO(x);
expanded to just ;
because the appropriate definition for FOO()
in this case was the empty string.
__nop()
is not a library function. It's an intrinsic that does exactly what you want. e.g.
#ifdef USE_NOP
#ifdef _MSC_VER
#include <intrin.h>
#define NOP() __nop() // _emit 0x90
#else
// assume __GNUC__ inline asm
#define NOP() asm("nop") // implicitly volatile
#endif
#else
#define NOP() // no NOPs
#endif
int idx(int *arr, int b) {
NOP();
return arr[b];
}
compiles with Clang7.0 -O3 for x86-64 Linux to this asm
idx(int*, int):
nop
movsxd rax, esi # sign extend b
mov eax, dword ptr [rdi + 4*rax]
ret
compiles with 32-bit x86 MSVC 19.16 -O2 -Gv to this asm
int idx(int *,int) PROC ; idx, COMDAT
npad 1 ; pad with a 1 byte NOP
mov eax, DWORD PTR [ecx+edx*4] ; __vectorcall arg regs
ret 0
and compiles with x64 MSVC 19.16 -O2 -Gv to this asm (Godbolt for all of them):
int idx(int *,int) PROC ; idx, COMDAT
movsxd rax, edx
npad 1 ; pad with a 1 byte NOP
mov eax, DWORD PTR [rcx+rax*4] ; x64 __vectorcall arg regs
ret 0
Interestingly, the sign-extension of b
to 64-bit is done before the NOP. Apparently x64 MSVC requires (by default) that functions start with at least a 2-byte or longer instruction (after the prologue of 1-byte push
instructions, maybe?), so they support hot-patching with a jmp rel8
.
If you use this in a 1-instruction function, you get an npad 2
(2 byte NOP) before the npad 1
from x64 MSVC:
int bar(int a, int b) {
__nop();
return a+b;
}
;; x64 MSVC 19.16
int bar(int,int) PROC ; bar, COMDAT
npad 2
npad 1
lea eax, DWORD PTR [rcx+rdx]
ret 0
I'm not sure how aggressively MSVC will reorder the NOP with respect to pure register instructions, but a^=b;
after the __nop()
will actually result in xor ecx, edx
before the NOP instruction.
But wrt. memory access, MSVC decides not to reorder anything to fill that 2-byte slot in this case.
int sink;
int foo(int a, int b) {
__nop();
sink = 1;
//a^=b;
return a+b;
}
;; MSVC 19.16 -O2
int foo(int,int) PROC ; foo, COMDAT
npad 2
npad 1
lea eax, DWORD PTR [rcx+rdx]
mov DWORD PTR int sink, 1 ; sink
ret 0
It does the LEA first, but doesn't move it before the __nop()
; seems like an obvious missed optimization, but then again if you're inserting __nop()
instructions then optimization is clearly not the priority.
If you compiled to a .obj
or .exe
and disassembled, you'd see a plain 0x90 nop
. But Godbolt doesn't support that for MSVC, only Linux compilers, unfortunately, so all I can do easily is copy the asm text output.
And as you'd expect, with the __nop()
ifdefed out, the functions compile normally, to the same code but with no npad
directive.
The nop
instruction will run as many times as the NOP() macro does in the C abstract machine. Ordering wrt. surrounding non-volatile
memory accesses is not guaranteed by the optimizer, or wrt. calculations in registers.
If you want it to be a compile-time memory reordering barrier, for GNU C use asm("nop" ::: "memory");`. For MSVC, that would have to be separate, I assume.
Upvotes: 3