Reputation: 9090
using int_type = int;
int_type min = std::numeric_limits<Depth>::min();
int_type max = std::numeric_limits<Depth>::max();
int_type convert(float f) {
if(f < static_cast<float>(min)) return min; // overflow
else if(f > static_cast<float>(max)) return max; // overflow
else return static_cast<int_type>(f);
}
Is there a more efficient way to convert float f
to int_type
, while clamping it to the minimal and maximal values of the integer type?
For example, without casting min
and max
to float
for the comparisons.
Upvotes: 2
Views: 786
Reputation: 11219
If you are want to truncate, you can take advantage of avx2 and avx instructions 512:
#include <float.h>
int main() {
__m256 a = {5.423423, -4.243423, 423.4234234, FLT_MAX, 79.4234876, 19.7, 8.5454, 7675675.6};
__m256i b = _mm256_cvttps_epi32(a);
void p256_hex_u32(__m256i in) {
alignas(32) uint32_t v[8];
_mm256_store_si256((__m256i*)v, in);
printf("v4_u32: %d %d %d %d %d %d %d %d\n", v[0], v[1], v[2], v[3], v[4], v[5], v[6], v[7]);
}
Compile with:
g++ -std=c++17 -mavx2 a.cpp && ./a.out
and for mavx512 (my cpu does not support so I will not provide a working test, feel free to edit):
_mm512_maskz_cvtt_roundpd_epi64(k, value, _MM_FROUND_NO_EXC);
Upvotes: 0
Reputation: 952
For 32-bit integers, you can let the CPU do some of the clamping work for you.
The cvtss2si
instruction will actually return 0x80000000 in the case of an out of range floating point number. This lets you eliminate one test most of the time:
int convert(float value)
{
int result = _mm_cvtss_si32(_mm_load_ss(&value));
if (result == 0x80000000 && value > 0.0f)
result = 0x7fffffff;
return result;
}
If you have lots of them to convert, then _mm_cvtps_epi32 lets you process four at once (with the same behaviour on overflow). That should be much faster than processing them one at a time, but you'd need to structure the code differently to make use of it.
Upvotes: 1
Reputation: 69882
Sometimes Almost always, trusting the compiler is the best thing to do.
This code:
template<class Integral>
__attribute__((noinline))
int convert(float f)
{
using int_type = Integral;
constexpr int_type min = std::numeric_limits<int_type>::min();
constexpr int_type max = std::numeric_limits<int_type>::max();
constexpr float fmin = static_cast<float>(min);
constexpr float fmax = static_cast<float>(max);
if(f < fmin) return min; // overflow
if(f > fmax) return max; // overflow
return static_cast<int_type>(f);
}
compiled with -O2 and -fomit-frame-pointer, yields:
__Z7convertIiEif: ## @_Z7convertIiEif
.cfi_startproc
movl $-2147483648, %eax ## imm = 0xFFFFFFFF80000000
movss LCPI1_0(%rip), %xmm1 ## xmm1 = mem[0],zero,zero,zero
ucomiss %xmm0, %xmm1
ja LBB1_3
movl $2147483647, %eax ## imm = 0x7FFFFFFF
ucomiss LCPI1_1(%rip), %xmm0
ja LBB1_3
cvttss2si %xmm0, %eax
LBB1_3:
retq
I'm not sure it could be any more efficient.
Note LCPI_x defined here:
.section __TEXT,__literal4,4byte_literals
.align 2
LCPI1_0:
.long 3472883712 ## float -2.14748365E+9
LCPI1_1:
.long 1325400064 ## float 2.14748365E+9
How about clamping using fmin(), fmax()... [thanks to njuffa for the question]
The code does become more efficient, because the conditional jumps are removed. However, it starts to behave incorrectly at the clamping limits.
Consider:
template<class Integral>
__attribute__((noinline))
int convert2(float f)
{
using int_type = Integral;
constexpr int_type min = std::numeric_limits<int_type>::min();
constexpr int_type max = std::numeric_limits<int_type>::max();
constexpr float fmin = static_cast<float>(min);
constexpr float fmax = static_cast<float>(max);
f = std::min(f, fmax);
f = std::max(f, fmin);
return static_cast<int_type>(f);
}
call with
auto i = convert2<int>(float(std::numeric_limits<int>::max()));
results in:
-2147483648
Clearly we need to reduce the limits by epsilon because of a float's inability to accurately represent the full range of an int, so...
template<class Integral>
__attribute__((noinline))
int convert2(float f)
{
using int_type = Integral;
constexpr int_type min = std::numeric_limits<int_type>::min();
constexpr int_type max = std::numeric_limits<int_type>::max();
constexpr float fmin = static_cast<float>(min) - (std::numeric_limits<float>::epsilon() * static_cast<float>(min));
constexpr float fmax = static_cast<float>(max) - (std::numeric_limits<float>::epsilon() * static_cast<float>(max));
f = std::min(f, fmax);
f = std::max(f, fmin);
return static_cast<int_type>(f);
}
Should be better...
except that now the same function call yields:
2147483392
Incidentally, working on this actually led me to a bug in the original code. Because of the same rounding error issue, the >
and <
operators need to be replaced with >=
and <=
.
like so:
template<class Integral>
__attribute__((noinline))
int convert(float f)
{
using int_type = Integral;
constexpr int_type min = std::numeric_limits<int_type>::min();
constexpr int_type max = std::numeric_limits<int_type>::max();
constexpr float fmin = static_cast<float>(min);
constexpr float fmax = static_cast<float>(max);
if(f <= fmin) return min; // overflow
if(f >= fmax) return max; // overflow
return static_cast<int_type>(f);
}
Upvotes: 1