Reputation: 3667
I was wondering what are the tradeoffs when implementing 32-bit IEEE-754 floating point division: using LUTs versus through the Newton-Raphson method?
When I say tradeoffs I mean in terms of memory size, instruction count, etc.
I have a small memory (130 words (each 16-bits)). I am storing upper 12-bits of mantissa (including hidden bit) in one memory location and lower 12-bits of mantissa in another location.
Currently I am using newton-raphson division, but am considering what are the tradeoffs if I changed my method. Here is a link to my algorithm: Newton's Method for finding the reciprocal of a floating point number for division
Thank you and please explain your reasoning.
Upvotes: 3
Views: 2642
Reputation: 19601
Each Newton-Raphson step roughly doubles the number of digits of precision, so if you can work out the number of bits of precision you expect from a LUT of a particular size, you should be able to work out how many NR steps you need to attain your desired precision. The Cray-1 used NR as the final stage of its reciprocal calculation. Looking for this I found a fairly detailed article on this sort of thing: An Accurate, High Speed Implementation of Division by Reciprocal Approximation from the 9th IEEE Symposium on Computer Arithmetic (September 6-8, 1989).
Upvotes: 3
Reputation: 490148
The trade-off is fairly simple. A LUT uses extra memory in the hope of reducing the instruction count enough to save some time. Whether it's effective will depend a lot on the details of the processor -- caching in particular.
For Newton-Raphson, you change X/Y to X* (1/Y) and use your iteration to find 1/Y. At least in my experience, if you need full precision, it's rarely useful -- it's primary strength is in allowing you to find something to (say) 16-bit precision more quickly.
The usual method for division is a bit-by-bit method. Although that particular answer deals with integers, for floating point you do essentially the same except that along with it you subtract the exponents. A floating point number is basically A*2N, where A is the significand and N is the exponent part of the number. So, you take two numbers A*2N / B * 2M, and carry out the division as A/B * 2N-M, with A and B being treated as (essentially) integers in this case. The only real difference is that with floating point you normally want to round rather than truncate the result. That basically just means carrying out the division (at least) one extra bit of precision, then rounding up if that extra bit is a one.
The most common method using lookup tables is SRT division. This is most often done in hardware, so I'd probably Google for something like "Verilog SRT" or "VHDL SRT". Rendering it in C++ shouldn't be terribly difficult though. Where the method I outlined in the linked answer produces on bit per iteration, this can be written to do 2, 4, etc. If memory serves, the size of table grows quadratically with the number of bits produced per iteration though, so you rarely see much more than 4 in practice.
Upvotes: 4