Reputation: 15
I have a function to convert floating point array to unsigned char array. This uses asm code to do that. The code was written many years ago. Now I am trying to build the solution in x64 bit. I understand that _asm is not supported on X64.
What is the best way to remove asm dependency?
Will the latest MS VC compiler optimize if I write C code? Does anyone know if there is anything in the boost or intrinsic funtions to accomplish this?
Thanks --Hari
I solved by the following code and this is faster than asm
inline static void floatTOuchar(float * pInbuf, unsigned char * pOutbuf, long len)
{
std::copy(pInbuf, pInbuf + len, pOutbuf);
return ;
}
Upvotes: 0
Views: 239
Reputation: 364180
With SSE2, you can use intrinsics to pack from float
down to unsigned char
, with saturation to unsigned the 0..255 range.
Convert four vectors of floats to vectors of ints, with CVTPS2DQ
(_mm_cvtps_epi32
) to round to nearest, or convert with truncation (_mm_cvttps_epi32
) if you want the default C floor
behaviour.
Then pack those vectors together, first to two vectors of signed 16bit int with two PACKSSDW
(_mm_packs_epi32
), then to one vector of unsigned 8bit int with PACKUSWB
(_mm_packus_epi16
). Note that PACKUSWB takes signed input, so using SSE4.1 PACKUSDW
as the first step just makes things more difficult (extra masking step). int16_t
can represent all possible values of uint8_t
, so there's no problem.
Store the resulting vector of uint8_t
and repeat for the next four vectors of floats.
Without manual vectorization, normal compiler output is good for code like.
int ftoi_truncate(float f) { return f; }
cvttss2si eax, xmm0
ret
int dtoi(double d) { return nearbyint(d); }
cvtsd2si eax, xmm0 # only with -ffast-math, though. Without, you get a function call :(
ret
Upvotes: 1
Reputation: 9452
You can try the following and let me know:
inline int float2int( double d )
{
union Cast
{
double d;
long l;
};
volatile Cast c;
c.d = d + 6755399441055744.0;
return c.l;
}
// Same thing but it's not always optimizer safe
inline int float2int( double d )
{
d += 6755399441055744.0;
return reinterpret_cast<int&>(d);
}
for(int i = 0; i < HUGE_NUMBER; i++)
int_array[i] = float2int(float_array[i]);
So the trick is the double parameters. In the current code , the function rounds the float to the nearest whole number.If you want truncation , use 6755399441055743.5 (0.5 less).
Very informative article available at: http://stereopsis.com/sree/fpu2006.html
Upvotes: 0