Reputation: 12922

Is there a C snippet that computes overflow-safe addition efficiently without using compiler builtins?

Here is a C function that adds an int to another, failing if overflow would happen:

int safe_add(int *value, int delta) {
        if (*value >= 0) {
                if (delta > INT_MAX - *value) {
                        return -1;
                }
        } else {
                if (delta < INT_MIN - *value) {
                        return -1;
                }
        }

        *value += delta;
        return 0;
}

Unfortunately it is not optimized well by GCC or Clang:

safe_add(int*, int):
        movl    (%rdi), %eax
        testl   %eax, %eax
        js      .L2
        movl    $2147483647, %edx
        subl    %eax, %edx
        cmpl    %esi, %edx
        jl      .L6
.L4:
        addl    %esi, %eax
        movl    %eax, (%rdi)
        xorl    %eax, %eax
        ret
.L2:
        movl    $-2147483648, %edx
        subl    %eax, %edx
        cmpl    %esi, %edx
        jle     .L4
.L6:
        movl    $-1, %eax
        ret

This version with __builtin_add_overflow()

int safe_add(int *value, int delta) {
        int result;
        if (__builtin_add_overflow(*value, delta, &result)) {
                return -1;
        } else {
                *value = result;
                return 0;
        }
}

is optimized better:

safe_add(int*, int):
        xorl    %eax, %eax
        addl    (%rdi), %esi
        seto    %al
        jo      .L5
        movl    %esi, (%rdi)
        ret
.L5:
        movl    $-1, %eax
        ret

but I'm curious if there's a way without using builtins that will get pattern-matched by GCC or Clang.

Upvotes: 10

Answers (4)

Alexander Cherepanov

Reputation: 71

The situation with signed operations is much worse than with unsigned ones, and I see only one pattern for signed addition, only for clang and only when a wider type is available:

int safe_add(int *value, int delta)
{
    long long result = (long long)*value + delta;

    if (result > INT_MAX || result < INT_MIN) {
        return -1;
    } else {
        *value = result;
        return 0;
    }
}

clang gives exactly the same asm as with __builtin_add_overflow:

safe_add:                               # @safe_add
        addl    (%rdi), %esi
        movl    $-1, %eax
        jo      .LBB1_2
        movl    %esi, (%rdi)
        xorl    %eax, %eax
.LBB1_2:
        retq

Otherwise, the simplest solution I can think of is this (with the interface as Jens used):

_Bool overadd(int a[static 1], int b)
{
    // compute the unsigned sum
    unsigned u = (unsigned)a[0] + b;

    // convert it to signed
    int sum = u <= -1u / 2 ? (int)u : -1 - (int)(-1 - u);

    // see if it overflowed or not
    _Bool overflowed = (b > 0) != (sum > a[0]);

    // return the results
    a[0] = sum;
    return overflowed;
}

gcc and clang generate very similar asm. gcc gives this:

overadd:
        movl    (%rdi), %ecx
        testl   %esi, %esi
        setg    %al
        leal    (%rcx,%rsi), %edx
        cmpl    %edx, %ecx
        movl    %edx, (%rdi)
        setl    %dl
        xorl    %edx, %eax
        ret

We want to compute the sum in unsigned, so unsigned have to be able to represent all values of int without any of them sticking together. To easily convert the result from unsigned to int, the opposite is useful too. Overall, two's complement is assumed.

On all popular platforms I think we can convert from unsigned to int by a simple assignment like int sum = u; but, as Jens mentioned, even the latest variant of the C2x standard allows it to raise a signal. The next most natural way is to do something like that: *(unsigned *)&sum = u; but non-trap variants of padding apparently could differ for signed and unsigned types. So the example above goes the hard way. Fortunately, both gcc and clang optimize this tricky conversion away.

P.S. The two variants above could not be compared directly as they have different behavior. The first one follows the original question and doesn't clobber the *value in case of overflow. The second one follows the answer from Jens and always clobbers the variable pointed to by the first parameter but it's branchless.

Upvotes: 3

Jens Gustedt

Reputation: 78903

Tthe best one I came up with, if you don't have access to the overflow flag of the architecture, is to do things in unsigned. Just think of all bit arithmetic here in that we are only interested in the highest bit, which is the sign bit when interpreted as signed values.

(All that modulo sign errors, I didn't check this thouroughly, but I hope the idea is clear)

#include <stdbool.h>

bool overadd(int a[static 1], int b) {
  unsigned A = a[0];
  unsigned B = b;
  // This computation will be done anyhow
  unsigned AB = A + B;
  // See if the sign bits are equal
  unsigned AeB = ~(A^B);
  unsigned AuAB = (A^AB);
  // The function result according to these should be:
  //
  // AeB \ AuAB | false | true
  //------------+-------+------
  // false      | false | false
  // true       | false | true
  //
  // So the expression to compute from the sign bits is (AeB & AuAB)

  // This is INT_MAX
  unsigned M = -1U/2;
  bool ret = (AeB & AuAB) > M;

  if (!ret) a[0] += b;
  return ret;
}

If you find a version of the addition that is free of UB, such as an atomic one, the assembler is even without branch (but with a lock prefix)

#include <stdbool.h>
#include <stdatomic.h>
bool overadd(_Atomic(int) a[static 1], int b) {
  unsigned A = a[0];
  atomic_fetch_add_explicit(a, b, memory_order_relaxed);
  unsigned B = b;
  // This computation will be done anyhow
  unsigned AB = A + B;
  // See if the sign bits are equal
  unsigned AeB = ~(A^B);
  unsigned AuAB = (A^AB);
  // The function result according to these should be:
  //
  // AeB \ AuAB | false | true
  //------------+-------+------
  // false      | false | false
  // true       | false | true
  //
  // So the expression to compute from the sign bits is (AeB & AuAB)

  // This is INT_MAX
  unsigned M = -1U/2;
  bool ret = (AeB & AuAB) > M;
  return ret;
}

So if we had such an operation, but even more "relaxed" this could improve the situation even further.

Take3: If we use a special "cast" from the unsigned result to the signed one, this now is branch free:

#include <stdbool.h>
#include <stdatomic.h>

bool overadd(int a[static 1], int b) {
  unsigned A = a[0];
  //atomic_fetch_add_explicit(a, b, memory_order_relaxed);
  unsigned B = b;
  // This computation will be done anyhow
  unsigned AB = A + B;
  // See if the sign bits are equal
  unsigned AeB = ~(A^B);
  unsigned AuAB = (A^AB);
  // The function result according to these should be:
  //
  // AeB \ AuAB | false | true
  //------------+-------+------
  // false      | false | false
  // true       | false | true
  //
  // So the expression to compute from the sign bits is (AeB & AuAB)

  // This is INT_MAX
  unsigned M = -1U/2;
  unsigned res = (AeB & AuAB);
  signed N = M-1;
  N = -N - 1;
  a[0] =  ((AB > M) ? -(int)(-AB) : ((AB != M) ? (int)AB : N));
  return res > M;
}

Upvotes: 5

Konrad Rudolph

Reputation: 545578

I could get the compiler to use the sign flag by assuming (and asserting) a two’s complement representation without padding bytes. Such implementations should yield the required behaviour in the line annotated by a comment, although I can’t find a positive formal confirmation of this requirement in the standard (and there probably isn’t any).

Note that the following code only handles positive integer addition, but can be extended.

int safe_add(int* lhs, int rhs) {
    _Static_assert(-1 == ~0, "integers are not two's complement");
    _Static_assert(
        1u << (sizeof(int) * CHAR_BIT - 1) == (unsigned) INT_MIN,
        "integers have padding bytes"
    );
    unsigned value = *lhs;
    value += rhs;
    if ((int) value < 0) return -1; // impl. def., 6.3.1.3/3
    *lhs = value;
    return 0;
}

This yields on both clang and GCC:

safe_add:
        add     esi, DWORD PTR [rdi]
        js      .L3
        mov     DWORD PTR [rdi], esi
        xor     eax, eax
        ret
.L3:
        mov     eax, -1
        ret

Upvotes: 0

Iłya Bursov

Reputation: 24146

the best version I can come up with is:

int safe_add(int *value, int delta) {
    long long t = *value + (long long)delta;
    if (t != ((int)t))
        return -1;
    *value = (int) t;
    return 0;
}

which produces:

safe_add(int*, int):
    movslq  %esi, %rax
    movslq  (%rdi), %rsi
    addq    %rax, %rsi
    movslq  %esi, %rax
    cmpq    %rsi, %rax
    jne     .L3
    movl    %eax, (%rdi)
    xorl    %eax, %eax
    ret
.L3:
    movl    $-1, %eax
    ret

Upvotes: 1

Is there a C snippet that computes overflow-safe addition efficiently without using compiler builtins?

Answers (4)

Related Questions