Reputation: 14993
A while ago I read somewhere that SSE intrinsic functions compile into efficient machine code because compilers treat them differently from ordinary functions. I am wandering how actually compilers do it and what C programmers can do to facilitate the process. Are there any guidelines on how to use intrinsic functions in a manner that makes compiler's job of generating efficient machine code easier.
Thanks.
Upvotes: 8
Views: 1871
Reputation: 7656
Contrary to what Necrolis wrote, the intrinsics may or may not compile down to the instructions they represent. This is especially true for copy or load instructions such as _mm_load_pd
, since the compiler is still responsible for register allocation and assignment when using intrinsics. This means that copying a value from one location to another may not be necessary at all, if the two locations can be represented by the same register. In that case the compiler may choose to remove the copy. It may also choose to remove other instructions if the result is never used.
Check out this blog post where the behavior of different compilers is compared in practice. It's from 2009, so the details may no longer apply. However, newer compilers are likely to optimize your code more, not less.
As for actually use intrinsics efficiently, the answer is the same as for all other performance optimization: Measure, measure and measure. Make sure that you are actually dealing with a hot piece of code, find out why it's slow and then improve it. You are very likely to find that improving your memory access patterns is more important than using intrinsics.
Upvotes: 7
Reputation: 26181
The intrinsics compile down to the instructions the represent, whether this is efficient or not depends on how they are used.
also, each compiler treats intrinsics a little differently (aka its implementation specific), but GCC is open source, so you can see how they treat the SSE ones, Open Watcom*, LCC, PCC and TCC* are all open source C compilers, although thwey don't have SSE intrinsics, they should still have intrinsics, and you can see how they handle them.
I think what you read was related to auto vectorization of code, something GCC(see this) and ICC are very good at, but they aren't as good as hand optimized code, at least not yet
*might have been updated with support for SSE, haven't checked lately...
Upvotes: 6