Reputation: 86
I've been trying to re-implement some existing vector and matrix classes to use SSE3 commands, and I seem to be running into these "memory access violation" errors whenever I perform a series of operations on an array of vectors. I'm relatively new to SSE, so I've been starting off simple. Here's the entirety of my vector class:
class SSEVector3D
{
public:
SSEVector3D();
SSEVector3D(float x, float y, float z);
SSEVector3D& operator+=(const SSEVector3D& rhs); //< Elementwise Addition
float x() const;
float y() const;
float z() const;
private:
float m_coords[3] __attribute__ ((aligned (16))); //< The x, y and z coordinates
};
So, not a whole lot going on yet, just some constructors, accessors, and one operation. Using my (admittedly limited) knowledge of SSE, I implemented the addition operation as follows:
SSEVector3D& SSEVector3D::operator+=(const SSEVector3D& rhs)
{
__m128 * pLhs = (__m128 *) m_coords;
__m128 * pRhs = (__m128 *) rhs.m_coords;
*pLhs = _mm_add_ps(*pLhs, *pRhs);
return (*this);
}
To speed-test my new vector class against the old one (to see if it's worth re-implementing the whole thing), I created a simple program that generates a random array of SSEVector3D objects and adds them together. Nothing too complicated:
SSEVector3D sseSum(0, 0, 0);
for(i=0; i<sseVectors.size(); i++)
{
sseSum += sseVectors[i];
}
printf("Total: %f %f %f\n", sseSum.x(), sseSum.y(), sseSum.z());
The sseVectors
variable is an std::vector containing elements of type SSEVector3D
, whose components are all initialized to random numbers between -1
and 1
.
Here's the issue I'm having. If the size of sseVectors
is 8,191
or less (a number I arrived at through a lot of trial and error), this runs fine. If the size is 8,192
or more, I get this error when I try to run it:
signal: SIGSEGV, si_code: 0 (memory access violation at address: 0x00000080)
However, if I comment out that print statement at the end, I get no error even if sseVectors
has a size of 8,192 or more.
Is there something wrong with the way I've written this vector class? I'm running Ubuntu 12.04.1 with GCC version 4.6
Upvotes: 3
Views: 1038
Reputation: 18233
The trick is to notice that __m128
is 16 byte aligned. Use _malloc_aligned()
to assure that your float array is correctly aligned, then you can go ahead and cast your float to an array of __m128
. Make sure also that the number of floats you allocate is divisible by four.
Upvotes: 0
Reputation: 8356
First, and foremost, don't do this
__m128 * pLhs = (__m128 *) m_coords;
__m128 * pRhs = (__m128 *) rhs.m_coords;
*pLhs = _mm_add_ps(*pLhs, *pRhs);
With SSE, always do your loads and stores explicitly via the appropriate intrinsics, never by just dereferencing. Instead of storing an array of 3 floats in your class, store a value of type _m128
. That should make the compiler align instances of your class correctly, without any need for align
attributes.
Note, however, that this won't work very well with MSVC. MSVC seems to generally be unable to cope with alignment requirements stronger than 8-byte aligned for by-value arguments :-(. The last time I needed to port SSE code to windows, my solution was to use Intel's C++ compiler for the SSE parts instead of MSVC...
Upvotes: 1