memory problems when using float4 in thrust cuda

Question

i've encountered a memory problem when using float4 in thrust cuda

adding the float4 "buggyVariable" as a member to the functor seems to cause the float data to be get left shifted by 1 float.

in CUDA_animateParticles i clearly set Y to 0 and Z to 1

yet when running the functor and drawing it in OpenGL. I get particles with Xposition=1 which indicate that Y that is 1 inside the functor.

i've also tested with float2 and float3 they seem to work fine.

so it seems to be a memory alignment issue or a bug.

can anyone shed some light on this? thanks for the help.

#include 
#include 
#include 
#include "cutil_math.h"

struct animateParticles_functor
{
    float4 buggyVariable; //why does adding this variable cause following floats to get wrong values???
    float pex, pey, pez, pew;

    __host__ __device__
    animateParticles_functor( float x, float y, float z, float w) :
        pex(x), pey(y), pez(z), pew(w)
    {
    }

    template 
    __host__ __device__
    void operator()(Tuple t)
    {
        if(pey > 0)
            thrust::get<0>(t) = make_float4(1, 0, 0, 0); //true if y is bugged
        else
            thrust::get<0>(t) = make_float4(0, 0, 0, 0); //false if its not bugged

        return;
    }
}

void CUDA_animateParticles(float4* cuda_devicePointer_vboPosition, float3* cuda_devicePointer_particleVelocitys, unsigned int numParticles, float4 particleEmitter)
{
    thrust::device_ptr d_pos(cuda_devicePointer_vboPosition);
    thrust::device_ptr d_vel(cuda_devicePointer_particleVelocitys);

    thrust::for_each(
        thrust::make_zip_iterator(thrust::make_tuple(d_pos, d_vel)),
        thrust::make_zip_iterator(thrust::make_tuple(d_pos + numParticles, d_vel + numParticles)),
        animateParticles_functor(0, 0, 1, 0) //notice that i set Z to 1 and not Y to 0
    );
}

Jared Hoberock · Accepted Answer

I think there are some flavors of MSVC where nvcc and cl.exe can't agree on sizeof(float4).

Try replacing your use of float4 with my_float4:

struct my_float4
{
  float x, y, z, w;
};

memory problems when using float4 in thrust cuda

Answers (1)

Related Questions