Hariharan Nagarajan
Hariharan Nagarajan

Reputation: 15

Best way to convert floating point array to integer. [Replacing my asm code for x64]

I have a function to convert floating point array to unsigned char array. This uses asm code to do that. The code was written many years ago. Now I am trying to build the solution in x64 bit. I understand that _asm is not supported on X64.

What is the best way to remove asm dependency?

Will the latest MS VC compiler optimize if I write C code? Does anyone know if there is anything in the boost or intrinsic funtions to accomplish this?

Thanks --Hari

I solved by the following code and this is faster than asm

inline static void floatTOuchar(float * pInbuf, unsigned char *  pOutbuf, long len)
{
    std::copy(pInbuf, pInbuf + len, pOutbuf);
    return ;
}   

Upvotes: 0

Views: 239

Answers (2)

Peter Cordes
Peter Cordes

Reputation: 364180

With SSE2, you can use intrinsics to pack from float down to unsigned char, with saturation to unsigned the 0..255 range.

Convert four vectors of floats to vectors of ints, with CVTPS2DQ (_mm_cvtps_epi32) to round to nearest, or convert with truncation (_mm_cvttps_epi32) if you want the default C floor behaviour.

Then pack those vectors together, first to two vectors of signed 16bit int with two PACKSSDW (_mm_packs_epi32), then to one vector of unsigned 8bit int with PACKUSWB (_mm_packus_epi16). Note that PACKUSWB takes signed input, so using SSE4.1 PACKUSDW as the first step just makes things more difficult (extra masking step). int16_t can represent all possible values of uint8_t, so there's no problem.

Store the resulting vector of uint8_t and repeat for the next four vectors of floats.


Without manual vectorization, normal compiler output is good for code like.

int ftoi_truncate(float f) { return f; }
    cvttss2si       eax, xmm0
    ret

int dtoi(double d) { return nearbyint(d); }
    cvtsd2si        eax, xmm0   # only with -ffast-math, though.  Without, you get a function call :(
    ret

Upvotes: 1

m0bi5
m0bi5

Reputation: 9452

You can try the following and let me know:

inline int float2int( double d )
{
   union Cast
   {
      double d;
      long l;
    };
   volatile Cast c;
   c.d = d + 6755399441055744.0;
   return c.l;
}

// Same thing but it's not always optimizer safe
inline int float2int( double d )
{
   d += 6755399441055744.0;
   return reinterpret_cast<int&>(d);
}

for(int i = 0; i < HUGE_NUMBER; i++)
     int_array[i] = float2int(float_array[i]);

So the trick is the double parameters. In the current code , the function rounds the float to the nearest whole number.If you want truncation , use 6755399441055743.5 (0.5 less).

Very informative article available at: http://stereopsis.com/sree/fpu2006.html

Upvotes: 0

Related Questions