Eric Foote
Eric Foote

Reputation: 86

Memory Access Violations When Using SSE Operations

I've been trying to re-implement some existing vector and matrix classes to use SSE3 commands, and I seem to be running into these "memory access violation" errors whenever I perform a series of operations on an array of vectors. I'm relatively new to SSE, so I've been starting off simple. Here's the entirety of my vector class:

class SSEVector3D
{
public:

   SSEVector3D();
   SSEVector3D(float x, float y, float z);

   SSEVector3D& operator+=(const SSEVector3D& rhs); //< Elementwise Addition

   float x() const;
   float y() const;
   float z() const;

private:

   float m_coords[3] __attribute__ ((aligned (16))); //< The x, y and z coordinates

};

So, not a whole lot going on yet, just some constructors, accessors, and one operation. Using my (admittedly limited) knowledge of SSE, I implemented the addition operation as follows:

SSEVector3D& SSEVector3D::operator+=(const SSEVector3D& rhs) 
{
   __m128 * pLhs = (__m128 *) m_coords;
   __m128 * pRhs = (__m128 *) rhs.m_coords;

   *pLhs = _mm_add_ps(*pLhs, *pRhs);

   return (*this);
}

To speed-test my new vector class against the old one (to see if it's worth re-implementing the whole thing), I created a simple program that generates a random array of SSEVector3D objects and adds them together. Nothing too complicated:

SSEVector3D sseSum(0, 0, 0);

for(i=0; i<sseVectors.size(); i++)
{
   sseSum += sseVectors[i];
}

printf("Total: %f %f %f\n", sseSum.x(), sseSum.y(), sseSum.z());

The sseVectors variable is an std::vector containing elements of type SSEVector3D, whose components are all initialized to random numbers between -1 and 1.

Here's the issue I'm having. If the size of sseVectors is 8,191 or less (a number I arrived at through a lot of trial and error), this runs fine. If the size is 8,192 or more, I get this error when I try to run it:

signal: SIGSEGV, si_code: 0 (memory access violation at address: 0x00000080)

However, if I comment out that print statement at the end, I get no error even if sseVectors has a size of 8,192 or more.

Is there something wrong with the way I've written this vector class? I'm running Ubuntu 12.04.1 with GCC version 4.6

Upvotes: 3

Views: 1038

Answers (2)

cdiggins
cdiggins

Reputation: 18233

The trick is to notice that __m128 is 16 byte aligned. Use _malloc_aligned() to assure that your float array is correctly aligned, then you can go ahead and cast your float to an array of __m128. Make sure also that the number of floats you allocate is divisible by four.

Upvotes: 0

fgp
fgp

Reputation: 8356

First, and foremost, don't do this

__m128 * pLhs = (__m128 *) m_coords;
__m128 * pRhs = (__m128 *) rhs.m_coords;
*pLhs = _mm_add_ps(*pLhs, *pRhs);

With SSE, always do your loads and stores explicitly via the appropriate intrinsics, never by just dereferencing. Instead of storing an array of 3 floats in your class, store a value of type _m128. That should make the compiler align instances of your class correctly, without any need for align attributes.

Note, however, that this won't work very well with MSVC. MSVC seems to generally be unable to cope with alignment requirements stronger than 8-byte aligned for by-value arguments :-(. The last time I needed to port SSE code to windows, my solution was to use Intel's C++ compiler for the SSE parts instead of MSVC...

Upvotes: 1

Related Questions