Reputation: 1445
Why Visual Studio C++ compiler doesn't optimize by default the following piece of code?
#include "ctime"
#include "iostream"
#define BIG_NUM 10000000000
int main() {
std::clock_t begin = clock();
for (unsigned long long i = 0; i < BIG_NUM; ++i) {
__asm
{
nop
}
}
std::clock_t end = clock();
std::cout << "time: " << double(end - begin) / CLOCKS_PER_SEC;
std::cin.get();
}
Without the _asm
block, operation time is always 0, because loop is "skipped" entirely due to compiler optimizations. With the _asm
block it takes few seconds instead.
Is there any compiler flag to optimize inline assembly or for some obscure reasons it isn't possible?
Upvotes: 0
Views: 272
Reputation: 9837
Adding some more information to the accepted answer
1) There are some compilers which can optimize across inline asm - the Xbox 360 compiler could but these are likely the exception rather than the rule.
2) There are some tools which run optimization on the compiled binary e.g. here - these are likely to be able to be able to optimize inline asm.
3) Finally, and probably most appropriately, one of the most popular reasons to add inline asm is to hand roll math heavy vectorised SIMD routines which are too complicated for the compiler to do on its own. If you want this then a much better way would be to use intrinsics. Intrinsics give you the best of both worlds - you can hand roll your tricky routines and THEN let the compiler handle register allocation, unrolling, interleaving, dead code pruning, etc for you.
For a good example for intrinsics see the examples below - if 'INLINE_ASM' is defined it takes ~300ms else it is optimized to nothing and takes 0 ms even though they do a similar thing.
#include <windows.h>
#include <iostream>
int main()
{
auto tc = ::GetTickCount();
for(int i=0; i<1024 * 1024 * 1024; ++i)
{
#if INLINE_ASM
_asm
{
paddw xmm0, xmm0;
}
#else
_mm_add_epi16(__m128i(), __m128i());
#endif
}
std::cout << "Took " << ::GetTickCount()-tc << " milli-seconds!" << std::endl;
}
Upvotes: 0
Reputation: 71899
The compiler doesn't really understand inline assembly, and thus assumes it could do anything.
Generally inline assembly is used when you specifically want to optimize some code at a low level. And if you're doing that, why do you expect the compiler to further optimize it?
Upvotes: 5