Reputation: 9090

Efficient float to int without overflow

using int_type = int;
int_type min = std::numeric_limits<Depth>::min();
int_type max = std::numeric_limits<Depth>::max();

int_type convert(float f) {
    if(f < static_cast<float>(min)) return min; // overflow
    else if(f > static_cast<float>(max)) return max; // overflow
    else return static_cast<int_type>(f);
}

Is there a more efficient way to convert float f to int_type, while clamping it to the minimal and maximal values of the integer type? For example, without casting min and max to float for the comparisons.

Upvotes: 2

Answers (3)

Antonin GAVREL

Reputation: 11219

If you are want to truncate, you can take advantage of avx2 and avx instructions 512:

#include <float.h>

int main() {
    __m256 a = {5.423423, -4.243423, 423.4234234, FLT_MAX, 79.4234876, 19.7, 8.5454, 7675675.6};
    __m256i b = _mm256_cvttps_epi32(a);
    void p256_hex_u32(__m256i in) {
    alignas(32) uint32_t v[8];
    _mm256_store_si256((__m256i*)v, in);
    printf("v4_u32: %d %d %d %d %d %d %d %d\n", v[0], v[1], v[2], v[3], v[4], v[5], v[6], v[7]);
}

Compile with:

g++ -std=c++17 -mavx2  a.cpp && ./a.out

and for mavx512 (my cpu does not support so I will not provide a working test, feel free to edit):

_mm512_maskz_cvtt_roundpd_epi64(k, value, _MM_FROUND_NO_EXC);

Upvotes: 0

Adam

Reputation: 952

For 32-bit integers, you can let the CPU do some of the clamping work for you.

The cvtss2si instruction will actually return 0x80000000 in the case of an out of range floating point number. This lets you eliminate one test most of the time:

int convert(float value)
{
    int result = _mm_cvtss_si32(_mm_load_ss(&value));
    if (result == 0x80000000 && value > 0.0f)
        result = 0x7fffffff;
    return result;
}

If you have lots of them to convert, then _mm_cvtps_epi32 lets you process four at once (with the same behaviour on overflow). That should be much faster than processing them one at a time, but you'd need to structure the code differently to make use of it.

Upvotes: 1

Richard Hodges

Reputation: 69882

~~Sometimes~~ Almost always, trusting the compiler is the best thing to do.

This code:

template<class Integral>
__attribute__((noinline))
int convert(float f)
{
    using int_type = Integral;
    constexpr int_type min = std::numeric_limits<int_type>::min();
    constexpr int_type max = std::numeric_limits<int_type>::max();

    constexpr float fmin = static_cast<float>(min);
    constexpr float fmax = static_cast<float>(max);

    if(f < fmin) return min; // overflow
    if(f > fmax) return max; // overflow
    return static_cast<int_type>(f);
}

compiled with -O2 and -fomit-frame-pointer, yields:

__Z7convertIiEif:                       ## @_Z7convertIiEif
    .cfi_startproc
    movl    $-2147483648, %eax      ## imm = 0xFFFFFFFF80000000
    movss   LCPI1_0(%rip), %xmm1    ## xmm1 = mem[0],zero,zero,zero
    ucomiss %xmm0, %xmm1
    ja  LBB1_3
    movl    $2147483647, %eax       ## imm = 0x7FFFFFFF
    ucomiss LCPI1_1(%rip), %xmm0
    ja  LBB1_3
    cvttss2si   %xmm0, %eax
LBB1_3:
    retq

I'm not sure it could be any more efficient.

Note LCPI_x defined here:

    .section    __TEXT,__literal4,4byte_literals
    .align  2
LCPI1_0:
    .long   3472883712              ## float -2.14748365E+9
LCPI1_1:
    .long   1325400064              ## float 2.14748365E+9

How about clamping using fmin(), fmax()... [thanks to njuffa for the question]

The code does become more efficient, because the conditional jumps are removed. However, it starts to behave incorrectly at the clamping limits.

Consider:

template<class Integral>
__attribute__((noinline))
int convert2(float f)
{
    using int_type = Integral;
    constexpr int_type min = std::numeric_limits<int_type>::min();
    constexpr int_type max = std::numeric_limits<int_type>::max();

    constexpr float fmin = static_cast<float>(min);
    constexpr float fmax = static_cast<float>(max);

    f = std::min(f, fmax);
    f = std::max(f, fmin);
    return static_cast<int_type>(f);
}

call with

auto i = convert2<int>(float(std::numeric_limits<int>::max()));

results in:

-2147483648

Clearly we need to reduce the limits by epsilon because of a float's inability to accurately represent the full range of an int, so...

template<class Integral>
__attribute__((noinline))
int convert2(float f)
{
    using int_type = Integral;
    constexpr int_type min = std::numeric_limits<int_type>::min();
    constexpr int_type max = std::numeric_limits<int_type>::max();

    constexpr float fmin = static_cast<float>(min) - (std::numeric_limits<float>::epsilon() * static_cast<float>(min));
    constexpr float fmax = static_cast<float>(max) - (std::numeric_limits<float>::epsilon() * static_cast<float>(max));

    f = std::min(f, fmax);
    f = std::max(f, fmin);
    return static_cast<int_type>(f);
}

Should be better...

except that now the same function call yields:

2147483392

Incidentally, working on this actually led me to a bug in the original code. Because of the same rounding error issue, the > and < operators need to be replaced with >= and <=.

like so:

template<class Integral>
__attribute__((noinline))
int convert(float f)
{
    using int_type = Integral;
    constexpr int_type min = std::numeric_limits<int_type>::min();
    constexpr int_type max = std::numeric_limits<int_type>::max();

    constexpr float fmin = static_cast<float>(min);
    constexpr float fmax = static_cast<float>(max);

    if(f <= fmin) return min; // overflow
    if(f >= fmax) return max; // overflow
    return static_cast<int_type>(f);
}

Upvotes: 1

Efficient float to int without overflow

Answers (3)

Related Questions