financial_physician
financial_physician

Reputation: 1978

Which number can computers retain better: `256,007` or `.000333`

I've always assumed that 256,007 would take up less space and be subject to less error than a number like .000333.

For context of the question, in an engineering course we're supposed to show how LU decomposition with pivoting is more numerically stable than without pivoting. The book argues that permuting the largest pivot into place is more stable but it seems to me like the smallest number would be best so that you don't end up with tiny decimals.

Upvotes: 0

Views: 44

Answers (1)

Eric Postpischil
Eric Postpischil

Reputation: 222362

Floating-point numbers are most often stored using a fixed format, that is, a predetermined format with an unchanging size.

The most commonly used formats are the IEEE-754 “single precision” and “double precision” formats. The former uses three fields to encode a number (or certain special “values” such as Not a Number values). The first field is a single bit S that specifies the sign (− or +). The second field is eight bits that specifies an exponent code, E. The third field is 23 bits that contains of the significand bits, F. The number represented is:

  • If E is 0, the number is (−1)S•*0.F2•21−127. “0.F2” represents writing “0.” followed by the 23 bits of F and interpreting it as a binary numeral, so that, for example, 0.11000000000000000000000 represents ¾.
  • If E is 1 to 254, the number is (−1)•S*1.F2•2E−127.
  • If E is 255 and F is zero, the number is an infinity, (−1)S∞.
  • If E is 255 and F is not zero, the value is a Not a Number (NaN).

As you can see, this format always uses 32 bits for any number it represents. There are not more for 256,007 or fewer for .000333. Also, this format cannot represent .000333. The closest it can represent is 0.00033300000359304249286651611328125.

The errors that occur in working with these numbers do depend on the magnitudes of the numbers. In any operation, the result must be rounded to fit in the format, and how close you can get to the exact result is partly determined by the exponent.

Textbooks about numerical analysis are generally written with these formats in mind. The types of errors the textbooks consider are those that are caused by these floating-point formats. The reason for choosing one pivot over another has to do with how the numbers in the calculations interact with each other (not just from the format alone).

There are formats that use varying amounts of memory. These are mostly “arbitrary precision” formats, meaning the precision, and the amount of memory used for representing numbers, is not fixed in advance but is adjusted for circumstances. If we were working with such numbers, the numerical analysis would be different because we could, in theory, make the final error as small as desired simply by using more and more memory, regardless of the magnitudes of the values involved. In other words, instead of choosing a large pivot, we could choose a small pivot but request lots and lots of precision.

Upvotes: 2

Related Questions