renger
renger

Reputation: 825

adding two 4-vectors with sse using pointers

This piece of code (doubling a 4-vector) works:

__declspec(align(16)) struct vec4 { float a[4]; };

int main()
{
    vec4 c;
    c.a[0]=2;
    c.a[1]=0;
    c.a[2]=0;
    c.a[3]=0;

    __asm {
        movaps xmm1, c

        addps xmm1, xmm1
        movaps c, xmm1
    }
}

But this piece (doing the same but now with a pointer to the aligned data) doesn't:

__declspec(align(16)) struct vec4 { float a[4]; };

int main()
{
    vec4* c = new vec4;
    c->a[0]=2;
    c->a[1]=0;
    c->a[2]=0;
    c->a[3]=0;

    __asm {
        movaps xmm1, c

        addps xmm1, xmm1
        movaps c, xmm1
    }
}

Why?

I need it to work with pointers, because I can't use the aligned data itself as a function argument.

Upvotes: 2

Views: 357

Answers (2)

Michael Gazonda
Michael Gazonda

Reputation: 2864

The problem is that objects created by a heap allocator (like new and malloc) don't follow the alignment you specify. You only get your alignment with a stack allocated object (your first example).

C++11 has support for explicit alignment of objects allocated through the heap with alignas, but this is not implemented by VC++ yet. It'll work with some compilers, and not others.

You have a couple of options.

The easiest one: create your heap allocated object as you did, and copy it to a stack allocated object before using it:

vec4* c = new vec4;
c->a[0]=2;
c->a[1]=0;
c->a[2]=0;
c->a[3]=0;

vec4 d = *c;
// process with d

The other option is to have your vec4 struct include enough additional memory so that you will be guaranteed to have 16 bytes on a 16 byte alignment. I believe that new guarantees a minimum of 4 byte alignment, so 28 bytes would do it. You would then have to manually check the pointer to see where you want to store the data to be used with sse.

Upvotes: 0

PTwr
PTwr

Reputation: 1273

Pointer in ASM must be treated accordingly to certain rules, which you can learn pretty much by learning how "MOV" works.

By the rules of Assembler, you first need to copy pointer to cpu register. then you can use it to point at memory location.

vec4 *d = ...;
__asm {
    mov eax, d
    movaps xmm1, [eax]

    addps xmm1, xmm1
    movaps [eax], xmm1
}

Upvotes: 0

Related Questions