Martijn Courteaux
Martijn Courteaux

Reputation: 68847

Assembly ADC (Add with carry) to C++

There is an x86 assembly instruction ADC. I've found this means "Add with carry". What does this mean/do? How would one implement the behavior of this instruction in C++?

INFO:
Compiled on Windows. I'm using a 32-bit Windows Installation. My processor is Core 2 Duo from Intel.

Upvotes: 23

Views: 59963

Answers (9)

Peter Cordes
Peter Cordes

Reputation: 364180

The C++ language doesn't have any concept of a carry flag, so making an intrinsic function wrapper around the ADC instruction is clunky. However, Intel did it anyway: unsigned char _addcarry_u32 (unsigned char c_in, unsigned a, unsigned b, unsigned * out);. Last I checked, gcc did a poor job with this (saving the carry result into an integer register, instead of leaving it in CF), but hopefully Intel's own compiler does better.

See also the tag wiki for assembly documentation.


The compiler will use ADC for you when adding integers wider than a single register, e.g. adding int64_t in 32bit code, or __int128_t in 64bit code.

#include <stdint.h>
#ifdef __x86_64__
__int128_t add128(__int128_t a, __int128_t b) { return a+b; }
#endif
    # clang 3.8 -O3  for x86-64, SystemV ABI.
    # __int128_t args passed in 2 regs each, and returned in rdx:rax
    add     rdi, rdx
    adc     rsi, rcx
    mov     rax, rdi
    mov     rdx, rsi
    ret

asm output from the Godbolt compiler explorer. clang's -fverbose-asm isn't very vebose, but gcc 5.3 / 6.1 wastes two mov instructions so it's less readable.

You can sometimes hand-hold compilers into emitting an adc or otherwise using the carry-out of add using the idiom uint64_t sum = a+b; / carry = sum < a;. But extending this to get a carry-out from an adc instead of add is not possible with current compilers; c+d+carry_in can wrap all the way around, and compilers don't manage to optimize the multiple checks for carry out on each + in c+d+carry if you do it safely.


Clang _ExtInt

There is one way I'm aware of to get a chain of add/adc/.../adc: Clang's new _ExtInt(width) feature that provides fixed-bit-width types of any size up to 16,777,215 bits (blog post). It was added to clang's development version on April 21, 2020, so it's not yet in any released version.

This will hopefully show up in ISO C and/or C++ at some point; The N2472 proposal is apparently being "being actively considered by the ISO WG14 C Language Committee"

(Update: this is now in C23 as _BitInt(256), although the max supported bit-width is implementation-dependent and might be as low as 128. Clang supports it as an extension in C++, but GCC and MSVC don't.)

typedef _ExtInt(256) wide_int;

wide_int add ( wide_int a, wide_int b) {
    return a+b;
}

compiles as follows with clang trunk -O2 for x86-64 (Godbolt):

add(int _ExtInt<256>, int _ExtInt<256>):
        add     rsi, r9
        adc     rdx, qword ptr [rsp + 8]
        adc     rcx, qword ptr [rsp + 16]
        mov     rax, rdi                        # return the retval pointer
        adc     r8, qword ptr [rsp + 24]        # chain of ADD / 3x ADC!

        mov     qword ptr [rdi + 8], rdx        # store results to mem
        mov     qword ptr [rdi], rsi
        mov     qword ptr [rdi + 16], rcx
        mov     qword ptr [rdi + 24], r8
        ret

Apparently _ExtInt is passed by value in integer registers until the calling convention runs out of registers. (At least in this early version; Perhaps x86-64 SysV should class it as "memory" when it's wider than 2 or maybe 3 registers, like structs larger than 16 bytes. Although moreso than structs, having it in registers is likely to be useful. Just put other args first so they're not displaced.)

The first _ExtInt arg is in R8:RCX:RDX:RSI, and the second has its low qword in R9, with the rest in memory.

A pointer to the return-value object is passed as a hidden first arg in RDI; x86-64 System V only ever returns in up to 2 integer registers (RDX:RAX) and this doesn't change that.

Upvotes: 10

Pierre
Pierre

Reputation: 570

There are __builtin_uadd_overflow intrinsics to build additions chains with the optimal use of add/adc by the compiler.

From old documentation (as old as gcc 5 !!!!)

https://gcc.gnu.org/onlinedocs/gcc/Integer-Overflow-Builtins.html

https://gcc.gnu.org/onlinedocs/gcc-5.3.0/gcc/Integer-Overflow-Builtins.html

Both clang and GCC do implement these builtins, and I verified the generated code is optimal on both x86_64 and aarch64 targets

This feature has now been released in the latest product update released version Visual Studio 2022 version 17.7 (e.g. _add_overflow_i8 ....)

#include <stdint.h>

typedef unsigned __int128 uint128_t;


// carry_out = a + b + carry_in
uint8_t my_addcarry_u64(uint8_t carry_in, uint64_t a, uint64_t b, uint64_t * sum)
{
        bool c;
        uint64_t res;
        c = __builtin_uaddll_overflow (a, b, (long long unsigned *)&res);
        c |= __builtin_uaddll_overflow (res, carry_in, (long long unsigned *)&res);
        *sum = res;
        return c;
}

// carry_out = a + b + carry_in
uint8_t my_addcarry_u128(uint8_t carry_in, uint128_t a, uint128_t b, uint128_t * sum)
{
        bool c;
        uint64_t res_lo, res_hi;
        c = __builtin_uaddll_overflow (a, b, (long long unsigned *)&res_lo);
        c |= __builtin_uaddll_overflow (carry_in, res_lo, (long long unsigned *)&res_lo);
        c = __builtin_uaddll_overflow (a >> 64, c, (long long unsigned *)&res_hi);
        c |= __builtin_uaddll_overflow (b >> 64, res_hi, (long long unsigned *)&res_hi);
        *sum = ((uint128_t)res_hi << 64) + res_lo;
        return c;
}

This thread is quite old, I provide a response to the original question "How would one implement the behavior of this instruction in C++" .

Upvotes: 2

Loge
Loge

Reputation: 13

This is my fastest Code:

template <class Ty>
constexpr bool add_carry_ux(bool carry_in, Ty src1, Ty src2, Ty* sum_out)
{
    const Ty sum = src1 + src2;
    const bool carry_out = (sum < src1) | ((sum == ~static_cast<Ty>(0)) & carry_in);

    *sum_out = sum + carry_in;

    return carry_out;
}

ASM:

add_carry_ux(bool, unsigned long, unsigned long, unsigned long*):
        add     rsi, rdx
        movzx   eax, dil
        setc    dl
        add     rax, rsi
        cmp     rsi, -1
        mov     QWORD PTR [rcx], rax
        sete    al
        movzx   edx, dl
        and     eax, edi
        or      eax, edx
        ret

Upvotes: 1

Řrřola
Řrřola

Reputation: 6307

In x86-64, the ADD instruction adds two 64-bit integers: add rax, rbx does rax = rax + rbx.
It also sets the carry flag to 1 when there was unsigned overflow (= when the result didn't fit in 64 bits), otherwise it sets the carry flag to 0.

In C++, you can simulate ADD like this:

uint64_t a, b;
bool carry;

a += b;
carry = (a < b);  // a+b can't be smaller than b: there must have been an overflow

The ADC instruction is like ADD, but adds the carry flag to the result:
adc rax, rbx does rax = rax + rbx + carry_flag.
It also sets the carry flag if there was unsigned overflow.

In C++:

uint64_t tmp = b + carry;
a += tmp;
carry = (tmp < carry) + (a < tmp);  // only one overflow can happen

The ADD and ADC instructions can be used to add big integers (with n "digits").
Use ADD for the least significant digits, then use ADC (n – 1) times to add the rest of the digits.
This is the “schoolbook addition algorithm”.

For example, adding 256-bit big integers with four 64-bit "digits":

mov rax, [rsi]        ; load the least significant source digit
mov rbx, [rsi + 8]    ; ...
mov rcx, [rsi + 16]
mov rdx, [rsi + 24]
add [rdi], rax        ; add it to the least significant destination digit
adc [rdi + 8], rbx    ; ... propagate carry up
adc [rdi + 16], rcx
adc [rdi + 24], rdx

Recent versions of the clang compiler can recognize big integer addition and use ADD/ADC to implement it.

constexpr uint64_t n = 4;
uint64_t dst[n], src[n];

// Add src to dst.
uint64_t carry = 0;
for (int i = 0; i < n; i++) {
  uint64_t tmp = src[i] + carry;
  dst[i] += tmp;
  carry = (tmp < carry) + (dst[i] < tmp);
}

Upvotes: 5

Serg Stetsuk
Serg Stetsuk

Reputation: 429

int32_t adc(uint32_t first, uint32_t second, uint32_t *carry)
    {
        uint32_t res;
        uint32_t carry_out = 0;
    
        if (!*carry)
        {
            res = first + second;
            *carry = (res < first) && (res < second);
    
            return res;
        }
    
        res = adc(first, second, &carry_out);
        if (*carry)
        {
            res++;
            carry_out |= !res;
        }
        
        *carry = carry_out;
    
        return res;
    }

Upvotes: 0

Chubsdad
Chubsdad

Reputation: 25497

From here (broken) or here

However, Intel processor has a special instruction called adc. This command behaves similarly as the add command. The only extra thing is that it also add the value carry flag along. So, this may be very handy to add large integers. Suppose you'd like to add a 32-bit integers with 16-bit registers. How can we do that? Well, let's say that the first integer is held on the register pair DX:AX, and the second one is on BX:CX. This is how:

add  ax, cx
adc  dx, bx

Ah, so first, the lower 16-bit is added by add ax, cx. Then the higher 16-bit is added using adc instead of add. It is because: if there are overflows, the carry bit is automatically added in the higher 16-bit. So, no cumbersome checking. This method can be extended to 64 bits and so on... Note that: If the 32-bit integer addition overflows too at the higher 16-bit, the result will not be correct and the carry flag is set, e.g. Adding 5 billion to 5 billion.

Everything from here on, remember that it falls pretty much into the zone of implementation defined behavior.

Here's a small sample that works for VS 2010 (32-bit, WinXp)

Caveat: $7.4/1- "The asm declaration is conditionally-supported; its meaning is implementation-defined. [ Note: Typically it is used to pass information through the implementation to an assembler. —end note ]"

int main(){
   bool carry = false;
   int x = 0xffffffff + 0xffffffff;
   __asm {
      jc setcarry
setcarry:
      mov carry, 1
   }
}

Upvotes: 9

Oshkosher
Oshkosher

Reputation: 59

There is a bug in this. Try this input:

unsigned first[10] =  {0x00000001};
unsigned second[10] = {0xffffffff, 0xffffffff};

The result should be {0, 0, 1, ...} but the result is {0, 0, 0, ...}

Changing this line:

carry = (first[i] > result[i]);

to this:

if (carry)
    carry = (first[i] >= result[i]);
else
    carry = (first[i] > result[i]);

fixes it.

Upvotes: 4

Sparky
Sparky

Reputation: 14057

The ADC behaviour can be simulated in both C and C++. The following example adds two numbers (stored as arrays of unsigned as they are too large to fit into a single unsigned).

unsigned first[10];
unsigned second[10];
unsigned result[11];

....   /* first and second get defined */

unsigned carry = 0;
for (i = 0; i < 10; i++) {
    result[i] = first[i] + second[i] + carry;
    carry = (first[i] > result[i]);
}
result[10] = carry;

Hope this helps.

Upvotes: 10

Simone
Simone

Reputation: 11797

ADC is the same as ADD but adds an extra 1 if processor's carry flag is set.

Upvotes: 34

Related Questions