c++cassemblytype-conversionmicro-optimization

Reputation: 1178

'Correct' unsigned integer comparison

So, we all know the C/C++ signed/unsigned comparison rules where -1 > 2u == true, and I have a situation where I want to implement 'correct' comparisons efficiently.

My question is, which is more efficient with considerations to as many architectures as people are familiar with. Obviously Intel and ARM have higher weight.

Given:

int x;
unsigned int y;
if (x < y) {}

Is it better to promote:

x < y  =>  (int64)x < (int64)y

or is it better to perform 2 comparisons, ie:

x < y  =>  (x < 0 || x < y)

The former implies a zero-extend, sign-extend, and one comparison+branch, and latter requires no sign-extend operations, but 2 consecutive cmp+branches.
Traditional wisdom suggests that branches are more expensive than sign extends, which will both pipeline, but there is a stall between the extends and the single comparison in the first case, whereas in the second case I can imagine that some architectures might pipeline the 2 comparisons, but then followed by 2 conditional branches?

Another case exists, where the unsigned value is a smaller type than the signed type, which means it can be done with a single zero-extend to the signed type's length, and then a single comparison... in that case, is it preferable to use the extend+cmp version, or is the 2-comparison method still preferred?

Intel? ARM? Others? I'm not sure if there's a right answer here, but I'd like to hear peoples take's. Low-level performance is hard to predict these days, especially on Intel and increasingly so on ARM.

Edit:

I should add, there is one obvious resolution, where the types are sized equal to the architecture int width; in that case it is obvious that the 2-comparison solution is preferred, since the promotion itself can't be performed efficiently. Clearly my int example meets this condition for 32-bit architectures, and you can transpose the thought experiment to short for the exercise applied to 32bit platforms.

Edit 2:

Sorry, I forgot the u in -1 > 2u! >_<

Edit 3:

I want to amend the situation to assume that the result of the comparison an actual branch, and the result is NOT returned as a boolean. This is how I'd prefer the structure look; although this does raise an interesting point that there is another set of permutations when the result is a bool vs a branch.

int g;
void fun(int x, unsigned in y) { if((long long)x < (long long)y) g = 10; }
void gun(int x, unsigned in y) { if(x < 0 || x < y) g = 10; }

This produces the intended branch typically implied when you encounter an if ;)

Upvotes: 33

Answers (7)

Ped7g

Reputation: 16596

The two branch version would be certainly slower, but actually none of that is two branch... nor single branch... on x86, if you want a boolean integer result.

For example x86-64 gcc 7.1 will for C++ source:

bool compare(int x, unsigned int y) {
    return (x < y); // "wrong" (will emit warning)
}

bool compare2(int x, unsigned int y) {
    return (x < 0 || static_cast<unsigned int>(x) < y);
}

bool compare3(int x, unsigned int y) {
    return static_cast<long long>(x) < static_cast<long long>(y);
}

Produce this assembly (godbolt live demo). (Args in EDI and ESI, with potential garbage in upper halves)

compare(int, unsigned int):
        cmp     edi, esi
        setb    al
        ret

compare2(int, unsigned int):
        mov     edx, edi
        shr     edx, 31       # sign bit of x
        cmp     edi, esi
        setb    al
        or      eax, edx
        ret

compare3(int, unsigned int):
        movsx   rdi, edi       # sign-extend x
        mov     esi, esi       # zero-extend y
        cmp     rdi, rsi
        setl    al
        ret

And if you would try to use these inside more complex code, they would get inlined in 99% cases (where at least the zero-extension would usually optimize away because GCC would already know it wrote the value to a 32-bit register, implicitly zeroing the upper half).

Without profiling it's just guessing, but "by a gut" I would go with compare3 as "faster", especially when executed out of order inside some code (sort of funny it does the proper 32->64 promotion even for uint argument, while it would require quite some effort to produce code calling that compare with some mess in upper 32b of esi ... but it would probably get rid of it when inlined in more complex calculation, where it would note the argument is also uint64 extended already, so the compare3 is then even simpler+shorter).

... as I said in comment, I don't hit tasks where I would need this, for example I can't imagine to work on something where valid range of data is unknown, so for the task I work on the C/C++ is perfect fit and I appreciate exactly the way how it works (that < for signed vs unsigned types is well defined and results in shortest/fastest code, plus warning is emitted to make me as programmer responsible to validate it, and in case of need to change the source appropriately).

Upvotes: 7

Davislor

Reputation: 15144

One portable trick you can do is to check if you can widen both arguments to intmax_t from <stdint.h>, which is the widest integral type an implementation supports. You can check (sizeof(intmax_t) > sizeof(x) && sizeof(intmax_t) >= sizeof(y)) and, if so, do a widening conversion. This works in the very common case where int is 32 bits wide and long long int is 64 bits wide.

In C++, you can do clever stuff where you have a safe-comparison template that checks std::numeric_limits<T> on its arguments. Here is one version. (Compile with -Wno-sign-compare on gcc or clang!)

#include <cassert>
#include <cstdint>
#include <limits>

using std::intmax_t;
using std::uintmax_t;

template<typename T, typename U>
  inline bool safe_gt( T x, U y ) {
    constexpr auto tinfo = std::numeric_limits<T>();
    constexpr auto uinfo = std::numeric_limits<U>();
    constexpr auto maxinfo = std::numeric_limits<intmax_t>();

    static_assert(tinfo.is_integer, "");
    static_assert(uinfo.is_integer, "");

    if ( tinfo.is_signed == uinfo.is_signed )
      return x > y;
    else if ( maxinfo.max() >= tinfo.max() &&
              maxinfo.max() >= uinfo.max() )
      return static_cast<intmax_t>(x) > static_cast<intmax_t>(y);
    else if (tinfo.is_signed) // x is signed, y unsigned.
      return x > 0 && x > y;
    else // y is signed, x unsigned.
      return y < 0 || x > y;      
  }

int main()
{
  assert(-2 > 1U);
  assert(!safe_gt(-2, 1U));
  assert(safe_gt(1U, -2));
  assert(safe_gt(1UL, -2L));
  assert(safe_gt(1ULL, -2LL));
  assert(safe_gt(1ULL, -2));
}

It could be made aware of floating-point by changing two lines.

Upvotes: 5

PMcK

Reputation: 413

Have a look at Andrei Alexandrescus keynote at the recent D conference in Berlin on Design by Introspection.

In it he shows how to design a checked int class at DESIGN time and one of the features he comes up with is exactly this - how to compare signed and unsigned.

Basically you need to perform 2 comparisons

If (signed_var < 0) then return unsigned_var else promote / cast signed_var to unsigned_var and then compare

Upvotes: 0

John Bollinger

Reputation: 180201

Given the specific setup you presented:

int x;
unsigned int y;

and your apparent intent to evaluate whether the value of x is numerically less than that of y, respecting the sign of x, I'd be inclined to write it as

if ((x < 0) || (x < y)) {}

that is, your second alternative. It expresses the intent clearly, and it is extensible to wider types, as long as the maximum representable value of y's type is at least as large as the maximum representable value of x's type. Thus, if you're willing to stipulate that the arguments will have that form then you could even write it as -- avert your eyes, C++ adherents -- a macro.

Converting both arguments to a signed, 64-bit integer type is not a portable solution, because there is no guarantee that that would in fact be a promotion from either int or unsigned int. It also is not extensible to wider types.

As for relative performance of your two alternative, I doubt there's much difference, but if it matters to you then you would want to write a careful benchmark. I could imagine the portable alternative requiring one more machine instruction than the other, and I can also imagine it requiring one less. Only if such comparisons dominate the performance of your application would a single instruction make a noticeable difference one way or the other.

Of course, this is specific to the situation you presented. If you want to handle mixed signed / unsigned comparisons in either order, for many different types, as sorted out at compile time, then a template-based wrapper could help you with that (and that would moot the question of using a macro), but I take you to be asking specifically about details of the comparison itself.

Upvotes: 6

Lundin

Reputation: 213809

You have to judge this on case by case basis. There are several reasons why signed types would be used in a program:

Because you actually need to have negative numbers in calculations or output.
"Sloppy typing", meaning that the programmer just types int all over their program without giving it much thought.
Accidentally signed. The programmed didn't actually want signed numbers, but ended up with them by accident, either through implicit type promotions or by using integer constants such as 0, which is of type int.

In case of 1) then the arithmetic should be carried out with signed arithmetic. You should then convert to the smallest possible type that is needed to contain the maximum expected values.

Suppose for example that a value can have range from -10000 to 10000. You will then need to use a 16 bit signed type to represent it. The correct type to use then, platform-independently, is int_fast16_t.

The int_fastn_t and uint_fastn_t types require that the type is at least as large as n but the compiler is allowed to pick a larger type if it gives faster code/better alignment.

2) is cured by studying stdint.h and by stop being lazy. As a programmer, one always needs to consider size and signedness of every single variable declared in the program. This has to be done at the point of declaration. Or if you get some manner of revelation later, go back and change the type.

If you don't consider the types carefully, you will with absolute certainty end up writing numerous, often subtle bugs. This is particularly important in C++, which is more picky about type correctness than C.

When "sloppy typing" is used, the actual intended type is most often unsigned rather than signed. Consider this sloppy typing example:

for(int i=0; i<n; i++)

It makes no sense at all to use signed int here, so why would you? Most likely you are iterating over an array or container and then the correct type to use is size_t.

Or alternatively, if you know the maximum size that n can hold, for example 100, then you can use the most suitable type for that:

for(uint_fast8_t i=0; i<100; i++)

3) is also cured by studying. Notably the various rules for implicit promotions that exist in these languages, such as the usual arithmetic conversions and integer promotion.

Upvotes: 7

Richard Hodges

Reputation: 69882

With a little template jiggery-pokery I think we can get the optimal result in all scenarios automatically:

#include<iostream>
#include<cassert>


template<class T> auto make_unsigned(T i) -> T { return i; }

auto make_unsigned(int i) -> unsigned int {
    assert(i >= 0);
    return static_cast<unsigned int>(i);
}

auto make_unsigned(short i) -> unsigned short {
    assert(i >= 0);
    return static_cast<unsigned short>(i);
}

auto make_unsigned(long long i) -> unsigned long long {
    assert(i >= 0);
    return static_cast<unsigned long long>(i);
}

template<
        class I1,
        class I2,
        std::enable_if_t<(std::is_signed<I1>::value and std::is_signed<I2>::value)
                         or (not std::is_signed<I1>::value and not std::is_signed<I2>::value)>* = nullptr
>
bool unsigned_less(I1 i1, I2 i2) {

    return i1 < i2;
};

template<
        class I1,
        class I2,
        std::enable_if_t<std::is_signed<I1>::value and not std::is_signed<I2>::value>* = nullptr
>
bool unsigned_less(I1 i1, I2 i2) {

    return (i1 < 0) or make_unsigned(i1) < i2;
};

template<
        class I1,
        class I2,
        std::enable_if_t<not std::is_signed<I1>::value and std::is_signed<I2>::value>* = nullptr
>
bool unsigned_less(I1 i1, I2 i2) {

    return not (i2 < 0) and i1 < make_unsigned(i2);
};



int main() {

    short a = 1;
    unsigned int b = 2;

    std::cout << unsigned_less(a, b) << std::endl;

    using uint = unsigned int;
    using ushort = unsigned short;
    std::cout << unsigned_less(ushort(1), int(3)) << std::endl;

    std::cout << unsigned_less(int(-1), uint(0)) << std::endl;
    std::cout << unsigned_less(int(1), uint(0)) << std::endl;

    return 0;
}

Upvotes: 0

Sneftel

Reputation: 41474

Well, you've correctly typified the situation: C/C++ have no way of doing a full signed int/unsigned int comparison with a single compare.

I would be surprised if promotion to int64 was faster than doing two comparisons. In my experience, compilers are quite good at realizing that a subexpression like that is pure (has no side effects) and thus that there's no need for a second branch. (You can also explicitly opt-out of short circuiting using bitwise-or: (x < 0) | (x < y).) In contrast, my experience is that compilers tend NOT to do much special-case optimization on integers greater than the native word size, so (int64)x < (int64)y is quite likely to actually do a full int comparison.

Bottom line, there's no incantation which is guaranteed to produce the best possible machine code on any processor, but for the most common compilers on the most common processors, I would guess that the two-comparison form would be no slower than the promotion-to-int64 form.

EDIT: Some mucking about on Godbolt confirms that on ARM32, GCC puts way too much machinery into the int64 approach. VC does the same on x86. With x64, though, the int64 approach is actually one instruction shorter (since the promotion and 64-bit comparison are trivial). The pipelining might make the actual performance go either way, though. https://godbolt.org/g/wyG4yC

Upvotes: 13

&#39;Correct&#39; unsigned integer comparison

Answers (7)

Related Questions

'Correct' unsigned integer comparison