Reputation: 1755
I'm trying to use aligned operations in SSE and I'm having an issue (surprise).
typedef struct _declspec(align(16)) Vec4 {
float x;
float y;
float z;
float w;
};
Vec4 SSE_Add(const Vec4 &a, const Vec4 &b) {
_declspec(align(16)) Vec4 return_val;
_asm {
MOV EAX, a // Load pointers into CPU regs
MOV EBX, b
MOVAPS XMM0, [EAX] // Move unaligned vectors to SSE regs
MOVAPS XMM1, [EBX]
ADDPS XMM0, XMM1 // Add vector elements
MOVAPS [return_val], XMM0 // Save the return vector
}
return return_val;
}
I get an access violation at return return_val
. Is this an alignment issue? How can I correct this?
Upvotes: 0
Views: 549
Reputation: 1719
I found out that the problem is with EBX register. If you push/pop EBX, then it works. I'm not sure why though, so if anyone can explain this - please do.
Edit: I've looked into the disassembly and at the beginning of a function it stores stack pointer in the EBX:
mov ebx, esp
So you better make sure not to lose it.
Upvotes: 2
Reputation: 949
This is a bit compiler dependent... Isn't the correct thing to write: movaps return_val, xmm0
Why don't you show us the generated code?
The way you are writing this is a lot worse than if you let the compiler do it on its own.
So... aligned versus unaligned MOVPS is your least concern.
Why not just, in portable code, write:
inline void add(const float *__restrict__ a, const float *__restrict__ b, float *__restrict__ r)
{
for (int i = 0; i != 4; ++i) r[i] = a[i] + b[i];
}
Upvotes: 0