echinodermata
echinodermata

Reputation: 787

Why do BigInteger implementations use sign-magnitude instead of two's complement?

Arbitary-precision signed integers are almost always implemented using a sign-magnitude representation:

The clear preference for sign-magnitude stands in contrast to the near-universal preference for two's complement in fixed-width signed integer types. The question is, why is sign-magnitude so clearly preferred for BigIntegers? (I welcome counterexamples if you disagree with the premise.)

Note that BigInteger APIs typically specify "as-if two's complement" semantics (e.g. Java, Python), for bitwise operations where it matters. This provides consistency with the usual meaning of those operations. This does not dictate the actual internal representation (merely an implementation detail), but it should be a point in favor of using two's complement internally, if all else were equal.

Floating-point numbers use sign-magnitude, unlike ints which use two's complement. Floating point is not really a guiding precedent here, though, since the behavior and algorithms for floating-point arithmetic are significantly different from integer arithmetic. Bignums are much more like ints than floats.

We know the "textbook" reasons for why two's complement works mathematically and why it has advantages. It seems to me that those reasons apply equally validly to both ints and BigIntegers. To what extent is that really true?

There is, of course, a vast difference between the design constraints of hardware fixed-precision integers and software arbitrary-precision integers. In this sense, it is not at all surprising to see that designers made different tradeoffs in these different domains. What, then, are the tradeoffs between sign-magnitude and two's complement, as applied to arbitrary-precision integers? For example, this could be in terms of the performance or simplicity of certain important algorithms.

I hope your answer will illuminate the design considerations that go into BigInteger arithmetic, as well as help me re-examine what I know about two's complement from a new perspective.

(To be clear: When I say two's complement for arbitrary-precision integers, I mean a representation using an array of words whose bit pattern, when put together, is the two's complement representation of the desired number - perhaps with the additional requirement that there be no "unnecessary leading 0's" (for nonnegative numbers) or "unnecessary leading 1's" (for negative numbers).)

Upvotes: 9

Views: 1509

Answers (2)

rcgldr
rcgldr

Reputation: 28921

Two's complement makes add and subtract a bit simpler for equal length numbers, but makes multiply and divide more complicated. For hardware implementation, there may be a time penalty, but not always. Looking at X86 "Ivy Bridge" instruction table, the only case where two's complement shows up in taking more time is for a 128 bit signed dividend divided by a 64 bit signed divisor. So this is mostly an issue for software based math.

Big integer libraries may use more complex but faster representations for large numbers. Here are some links to example articles:

https://en.wikipedia.org/wiki/Arbitrary-precision_arithmetic

https://cp-algorithms.com/algebra/big-integer.html

http://www.apfloat.org/ntt.html

The more complex methods are mostly faster for fairly large numbers, for medium size numbers, simpler implementations will be faster.

Upvotes: 6

Spektre
Spektre

Reputation: 51873

As I build few of my own bignum libs I agree with rcgldr's answer (+1) two's complement brings up problems in higher operations not just *,/.

On top of all of this some bignum libraries do not use power of 2 as base and using two's complement for such is also trickery. The reason for not using power of 2 is that we are computing in base 10 so we expect input and result in such. The conversion between base 2 (or power of 2) and base 10 is IIRC ~O(n^2) task and for really big numbers it usually costs more than the operation performed on them. So the libs use the biggest power of 10 that fits into ALU word used ... for example in 32 bit it is 1 000 000 000 This make slight waste of space but ease up the input and output conversions between number and string representations to O(n). Where n is number of digits or words used ...

Also two's complement complicates modulo arithmetics needed for many underlaying operations like multiplication by NTT

The twos complement handling and recovery will take O(n) while separate sign just O(1) which is the main reason for this I think.

Upvotes: 2

Related Questions