Reputation: 68847
There is an x86 assembly instruction ADC
. I've found this means "Add with carry". What does this mean/do? How would one implement the behavior of this instruction in C++?
INFO:
Compiled on Windows. I'm using a 32-bit Windows Installation. My processor is Core 2 Duo from Intel.
Upvotes: 23
Views: 59963
Reputation: 364180
The C++ language doesn't have any concept of a carry flag, so making an intrinsic function wrapper around the ADC
instruction is clunky. However, Intel did it anyway: unsigned char _addcarry_u32 (unsigned char c_in, unsigned a, unsigned b, unsigned * out);
. Last I checked, gcc did a poor job with this (saving the carry result into an integer register, instead of leaving it in CF), but hopefully Intel's own compiler does better.
See also the x86 tag wiki for assembly documentation.
The compiler will use ADC for you when adding integers wider than a single register, e.g. adding int64_t
in 32bit code, or __int128_t
in 64bit code.
#include <stdint.h>
#ifdef __x86_64__
__int128_t add128(__int128_t a, __int128_t b) { return a+b; }
#endif
# clang 3.8 -O3 for x86-64, SystemV ABI.
# __int128_t args passed in 2 regs each, and returned in rdx:rax
add rdi, rdx
adc rsi, rcx
mov rax, rdi
mov rdx, rsi
ret
asm output from the Godbolt compiler explorer. clang's -fverbose-asm
isn't very vebose, but gcc 5.3 / 6.1 wastes two mov
instructions so it's less readable.
You can sometimes hand-hold compilers into emitting an adc
or otherwise using the carry-out of add
using the idiom uint64_t sum = a+b;
/ carry = sum < a;
. But extending this to get a carry-out from an adc
instead of add
is not possible with current compilers; c+d+carry_in
can wrap all the way around, and compilers don't manage to optimize the multiple checks for carry out on each +
in c+d+carry
if you do it safely.
_ExtInt
There is one way I'm aware of to get a chain of add/adc/.../adc: Clang's new _ExtInt(width)
feature that provides fixed-bit-width types of any size up to 16,777,215 bits (blog post). It was added to clang's development version on April 21, 2020, so it's not yet in any released version.
This will hopefully show up in ISO C and/or C++ at some point; The N2472 proposal is apparently being "being actively considered by the ISO WG14 C Language Committee"
(Update: this is now in C23 as _BitInt(256)
, although the max supported bit-width is implementation-dependent and might be as low as 128. Clang supports it as an extension in C++, but GCC and MSVC don't.)
typedef _ExtInt(256) wide_int;
wide_int add ( wide_int a, wide_int b) {
return a+b;
}
compiles as follows with clang trunk -O2
for x86-64 (Godbolt):
add(int _ExtInt<256>, int _ExtInt<256>):
add rsi, r9
adc rdx, qword ptr [rsp + 8]
adc rcx, qword ptr [rsp + 16]
mov rax, rdi # return the retval pointer
adc r8, qword ptr [rsp + 24] # chain of ADD / 3x ADC!
mov qword ptr [rdi + 8], rdx # store results to mem
mov qword ptr [rdi], rsi
mov qword ptr [rdi + 16], rcx
mov qword ptr [rdi + 24], r8
ret
Apparently _ExtInt
is passed by value in integer registers until the calling convention runs out of registers. (At least in this early version; Perhaps x86-64 SysV should class it as "memory" when it's wider than 2 or maybe 3 registers, like structs larger than 16 bytes. Although moreso than structs, having it in registers is likely to be useful. Just put other args first so they're not displaced.)
The first _ExtInt arg is in R8:RCX:RDX:RSI, and the second has its low qword in R9, with the rest in memory.
A pointer to the return-value object is passed as a hidden first arg in RDI; x86-64 System V only ever returns in up to 2 integer registers (RDX:RAX) and this doesn't change that.
Upvotes: 10
Reputation: 570
There are __builtin_uadd_overflow intrinsics to build additions chains with the optimal use of add/adc by the compiler.
From old documentation (as old as gcc 5 !!!!)
https://gcc.gnu.org/onlinedocs/gcc/Integer-Overflow-Builtins.html
https://gcc.gnu.org/onlinedocs/gcc-5.3.0/gcc/Integer-Overflow-Builtins.html
Both clang and GCC do implement these builtins, and I verified the generated code is optimal on both x86_64 and aarch64 targets
This feature has now been released in the latest product update released version Visual Studio 2022 version 17.7 (e.g. _add_overflow_i8 ....)
#include <stdint.h>
typedef unsigned __int128 uint128_t;
// carry_out = a + b + carry_in
uint8_t my_addcarry_u64(uint8_t carry_in, uint64_t a, uint64_t b, uint64_t * sum)
{
bool c;
uint64_t res;
c = __builtin_uaddll_overflow (a, b, (long long unsigned *)&res);
c |= __builtin_uaddll_overflow (res, carry_in, (long long unsigned *)&res);
*sum = res;
return c;
}
// carry_out = a + b + carry_in
uint8_t my_addcarry_u128(uint8_t carry_in, uint128_t a, uint128_t b, uint128_t * sum)
{
bool c;
uint64_t res_lo, res_hi;
c = __builtin_uaddll_overflow (a, b, (long long unsigned *)&res_lo);
c |= __builtin_uaddll_overflow (carry_in, res_lo, (long long unsigned *)&res_lo);
c = __builtin_uaddll_overflow (a >> 64, c, (long long unsigned *)&res_hi);
c |= __builtin_uaddll_overflow (b >> 64, res_hi, (long long unsigned *)&res_hi);
*sum = ((uint128_t)res_hi << 64) + res_lo;
return c;
}
This thread is quite old, I provide a response to the original question "How would one implement the behavior of this instruction in C++" .
Upvotes: 2
Reputation: 13
This is my fastest Code:
template <class Ty>
constexpr bool add_carry_ux(bool carry_in, Ty src1, Ty src2, Ty* sum_out)
{
const Ty sum = src1 + src2;
const bool carry_out = (sum < src1) | ((sum == ~static_cast<Ty>(0)) & carry_in);
*sum_out = sum + carry_in;
return carry_out;
}
ASM:
add_carry_ux(bool, unsigned long, unsigned long, unsigned long*):
add rsi, rdx
movzx eax, dil
setc dl
add rax, rsi
cmp rsi, -1
mov QWORD PTR [rcx], rax
sete al
movzx edx, dl
and eax, edi
or eax, edx
ret
Upvotes: 1
Reputation: 6307
In x86-64, the ADD instruction adds two 64-bit integers: add rax, rbx
does rax = rax + rbx
.
It also sets the carry flag to 1 when there was unsigned overflow (= when the result didn't fit in 64 bits), otherwise it sets the carry flag to 0.
In C++, you can simulate ADD like this:
uint64_t a, b;
bool carry;
a += b;
carry = (a < b); // a+b can't be smaller than b: there must have been an overflow
The ADC instruction is like ADD, but adds the carry flag to the result:
adc rax, rbx
does rax = rax + rbx + carry_flag
.
It also sets the carry flag if there was unsigned overflow.
In C++:
uint64_t tmp = b + carry;
a += tmp;
carry = (tmp < carry) + (a < tmp); // only one overflow can happen
The ADD and ADC instructions can be used to add big integers (with n "digits").
Use ADD for the least significant digits, then use ADC (n – 1) times to add the rest of the digits.
This is the “schoolbook addition algorithm”.
For example, adding 256-bit big integers with four 64-bit "digits":
mov rax, [rsi] ; load the least significant source digit
mov rbx, [rsi + 8] ; ...
mov rcx, [rsi + 16]
mov rdx, [rsi + 24]
add [rdi], rax ; add it to the least significant destination digit
adc [rdi + 8], rbx ; ... propagate carry up
adc [rdi + 16], rcx
adc [rdi + 24], rdx
Recent versions of the clang
compiler can recognize big integer addition and use ADD/ADC to implement it.
constexpr uint64_t n = 4;
uint64_t dst[n], src[n];
// Add src to dst.
uint64_t carry = 0;
for (int i = 0; i < n; i++) {
uint64_t tmp = src[i] + carry;
dst[i] += tmp;
carry = (tmp < carry) + (dst[i] < tmp);
}
Upvotes: 5
Reputation: 429
int32_t adc(uint32_t first, uint32_t second, uint32_t *carry)
{
uint32_t res;
uint32_t carry_out = 0;
if (!*carry)
{
res = first + second;
*carry = (res < first) && (res < second);
return res;
}
res = adc(first, second, &carry_out);
if (*carry)
{
res++;
carry_out |= !res;
}
*carry = carry_out;
return res;
}
Upvotes: 0
Reputation: 25497
However, Intel processor has a special instruction called adc. This command behaves similarly as the add command. The only extra thing is that it also add the value carry flag along. So, this may be very handy to add large integers. Suppose you'd like to add a 32-bit integers with 16-bit registers. How can we do that? Well, let's say that the first integer is held on the register pair DX:AX, and the second one is on BX:CX. This is how:
add ax, cx adc dx, bx
Ah, so first, the lower 16-bit is added by add ax, cx. Then the higher 16-bit is added using adc instead of add. It is because: if there are overflows, the carry bit is automatically added in the higher 16-bit. So, no cumbersome checking. This method can be extended to 64 bits and so on... Note that: If the 32-bit integer addition overflows too at the higher 16-bit, the result will not be correct and the carry flag is set, e.g. Adding 5 billion to 5 billion.
Everything from here on, remember that it falls pretty much into the zone of implementation defined behavior.
Here's a small sample that works for VS 2010 (32-bit, WinXp)
Caveat: $7.4/1- "The asm declaration is conditionally-supported; its meaning is implementation-defined. [ Note: Typically it is used to pass information through the implementation to an assembler. —end note ]"
int main(){
bool carry = false;
int x = 0xffffffff + 0xffffffff;
__asm {
jc setcarry
setcarry:
mov carry, 1
}
}
Upvotes: 9
Reputation: 59
There is a bug in this. Try this input:
unsigned first[10] = {0x00000001};
unsigned second[10] = {0xffffffff, 0xffffffff};
The result should be {0, 0, 1, ...} but the result is {0, 0, 0, ...}
Changing this line:
carry = (first[i] > result[i]);
to this:
if (carry)
carry = (first[i] >= result[i]);
else
carry = (first[i] > result[i]);
fixes it.
Upvotes: 4
Reputation: 14057
The ADC behaviour can be simulated in both C and C++. The following example adds two numbers (stored as arrays of unsigned as they are too large to fit into a single unsigned).
unsigned first[10];
unsigned second[10];
unsigned result[11];
.... /* first and second get defined */
unsigned carry = 0;
for (i = 0; i < 10; i++) {
result[i] = first[i] + second[i] + carry;
carry = (first[i] > result[i]);
}
result[10] = carry;
Hope this helps.
Upvotes: 10
Reputation: 11797
ADC is the same as ADD but adds an extra 1 if processor's carry flag is set.
Upvotes: 34