dida1007
dida1007

Reputation: 11

Implementation of a floating point library of any precision

I have to implement floating point library of any precision so the exponent and mantissa must be an infinitely positive integer. Later I have to create adding and subtracting functions etc. using the processor's x86 capabilities. I have to use C++/C (I can use assembly inserts) and I can use ready-made libraries.

At the beginning I have problems: 1. what type use to store the infinitely large positive integer? 2.what libraries / functions will allow me to use the processor's capabilities and will work with the above data type?

Upvotes: 1

Views: 527

Answers (2)

Gem Taylor
Gem Taylor

Reputation: 5635

One important consideration is what base you will use for your internal unit representation. As others have said, this will probably be a vector of individual units that you then process using those "on paper" operations you learnt in primary school. Perhaps a struct that holds separate vectors for mantassa and exponent and a flag for the sign of each of them. But what goes in those individual units?

This choice is flavoured by the efficiency of those operations.

You might decide that using characters and base 10 is the easiest way to code this, and the read and write functions are very trivial! But actually the processor is quite inefficient at processing one byte at a time in decimal. It is great for reading the numbers in, though! For example doing 56 * 37 would require the following operations on "decimal" architecture:

  • 6 * 7 -> 42
  • 42 / 10 -> ________ 4
  • 42 %10 -> ___________ 2*
  • 5 * 7 -> 35
  • 35 / 10 -> ______ 3
  • 35 %10 -> ________ 5
  • 6 * 3 -> 18
  • 18 / 10 -> ______ 1
  • 18 %10 -> ________ 8
  • 5 * 3 -> 15
  • 15 / 10 -> ____ 1
  • 15 %10 -> ______ 5
  • 4 + 5 + 8 -> 17
  • 17 / 10 -> ______ 1
  • 17 %10 -> ________ 7*
  • 3 + 1 + 5 + 1 -> 10
  • 10 / 10 -> ____ 1
  • 10 %10 -> ______ 0*
  • 1 + 1 -> 2
  • 2 /10 -> 0
  • 2 % 2 -> ____ 2*
  • Final answer __ 2 0 7 2

Which is a lot of multiply and divide operations, and that is only 2 digits!

You might decide that using the whole byte 0-255 is much more efficient, and doing base 256 math is much simpler. It is, especially as the digit splitting /10 and %10 steps become simple shifts and masks! But reading and writing arbitrary length numbers in and out of the display in decimal is much, much harder, and made even more difficult if you decide to have a base 2 exponent!

Doing the same math operations in base 65536 is more efficient, and base 4G on a 64-bit processor is even more efficient for the math, and not much more complex for the printing. With assembler you can even use 64bit storage and 128 bit multiply operations!

Each time you double the internal word size you reduce the number of partial operations for multiplication by 4, and for division the reduction is even more significant!

One cheaty way to get the printing efficient and keep the maths reasonably efficient is to actually store a number of digits as binary in each storage unit. A 32bit store can actually hold an integer number up to 9 digits long in each unit. By doing math in base 1 billion, you only have to do all the extra digit splitting operations as shown above every 9 digits!

Reading a number string in involves reading 9 digits at a time into storage, then fixing the exponent, which I recommend is still in powers of 10. Numbers should always be internally held in a normalised float format, where there is only one digit (or perhaps always a 0) above the decimal point. If the input string contains the decimal point at a different location, or is actually an integer value, then you can fix it by fiddling the exponent value. Numbers are always printed in that format or as integers. This way, the unused digits of the number are always in the bottom of the least significant unit, and if they are zero, then they are fully functional.

In this schema, adding or subtracting 2 numbers that have different exponents requires a bit of extra work, as one of the numbers must be split into the billion alignment of the other before they can be summed, but the addition is still very cheep compared to multiplication or division.

Reading the exponent value itself into the hold vector is tricky, but perhaps you can live with an exponent that is limited to the 64-bit range, which means you don't have to do multibyte manipulation on the exponent number itself? That allows numbers up to 0.99E10sextillion (that's an 19 digit number just for the exponent), which should be enough for most purposes. This assumes the exponent is still representing a power of 10, which makes storage of the values convenient, or you could do it in powers of billions, increasing the range "exponentially", but then the text number reading is more difficult, as the read numbers have to be realigned (a similar operation to the add/subtract pre-alignment), but you do also benefit from not having to do that add/subtract pre-alignment.

Upvotes: 0

eerorika
eerorika

Reputation: 238451

  1. what type use to store the infinitely large positive integer?

A vector of integers. Just like a fixed size integer consists of bytes - which are like smaller integers themselves - this large integer consists of smaller parts. Unlike the fixed size integers however, the vector can grow arbitrarily large (until memory runs out).

You can wrap the vector in a custom type in C++ to provide an object oriented interface.

2.what libraries / functions will allow me to use the processor's capabilities and will work with the above data type?

If you create a custom type, other libraries won't have direct support for it. Generic template libraries may however be used if your class provides a compatible interface.

For example, if you provide a comparison operator and assignment operator, then there should be no problem sorting your custom class objects using std::sort from the standard library.

I can use ready-made libraries.

This is usually a good idea. I recommend it.

Upvotes: 3

Related Questions