Reputation: 265
The SSE code I have got was written for x64, where the stack is aligned by 16. The optimised code paths have now been requested for 32-bit x86 (for MSVC/Windows and GCC/Linux). Getting this working on MSVC first.
Now apart from some inlines that took more than 3 __m128
parameters which it refused to compile (fixed by making a const ref and hoping the compiler will optimize it out) everything seems to work as is.
//error C2719: 'd': formal parameter with __declspec(align('16')) won't be aligned
inline __m128i foo(__m128i a, __m128i b, __m128i c, __m128i d) {...}
However I was under the impression the stack is not 16byte aligned on x86 Windows. Yet some __declspec(align(16))
arrays on the stack didn't even get a warning, and I am sure it must be pushing and popping the __m128
s (I recall working out 12 registers were required on x64, and even then it moved some to the stack it didn't need for a bit and did its own thing anyway).
I even added some asserts on the array memory addresses (and turned off NDEBUG) and they all seem to pass.
__declspec(align(16)) uint32_t blocks[64];
assert(((uintptr_t)blocks) % 16 == 0);
__m128i a = ...;
__m128i b = ...;
__m128i c = ...;
__m128i d = ...;
__m128i e = ...;
__m128i f = ...;
__m128i g = ...;
//do other stuff, which surely means there is not enough registers on x86
Did I just get really lucky or is there some magic going on here to realign the stack? And is this portable? I am sure I recall having issues getting some D3DX stuff to align on x86 when I was doing D3D9 back with VS2008.
One thing I did get a bunch of warnings for however was the __m128
-> __m128&
conversions being non-standard. Is this really not supported on some compiler that does support SSE, and how is one meant to avoid it (e.g. inlines with output __m128
's, or more than 3 params)?
Also a quick look suggests somehow MS themselves break these rules (e.g. XMMatrixTransformation http://msdn.microsoft.com/en-us/library/windows/desktop/microsoft.directx_sdk.matrix.xmmatrixtransformation%28v=vs.85%29.aspx takes 6 SSE objects, the only difference I can see being there wrapped in structs)
XMMATRIX XMMatrixTransformation(
[in] XMVECTOR ScalingOrigin,
[in] XMVECTOR ScalingOrientationQuaternion,
[in] XMVECTOR Scaling,
[in] XMVECTOR RotationOrigin,
[in] XMVECTOR RotationQuaternion,
[in] XMVECTOR Translation
);
Upvotes: 2
Views: 1066
Reputation: 13634
The variables on stack are aligned. As far as I recall, Visual C++ always properly overaligned stack variables.
The error that you see for the fourth parameter is that your Visual C++ is not able to pass overaligned type as a value parameter passed as a pointer on stack. The first three are passed via registers.
Use __vectorcall
to pass more parameters via registers (six), and to pass the rest of the parameters by stack value (thus avoiding the error even for 7th parameter).
Use the latest Visual C++ which can pass overaligned types normally (starting in Visual C++ 2017). (There was a bug fixed relatively recently, but it was about passing non-trivially copyable overaligned types, xmm types are trivially copyable, so they are fine).
Better use both latest Visual C++ and __vectorcall
:-)
Upvotes: 1