Reputation: 1085

Floating vs Fixed point numbers and performance

I'm curious on how quick floating point operations are on dedicated hardware vs fixed.

With fixed point, say you have the number 555 and you want to multiply by 1.54, you'd load the values 555, 154, and 100 into registers (three clocks), multiply 555 by 154 (four clocks), and divide by 100 (four clocks), then load the output register into memory (one clock).

With floating point, you'd load the values 555 and 1.54 into registers (two clocks), get the larger scaling/exponent (one clock), subtract the scalings (one clock), normalize one (four clocks for a multiply), multiply (four clocks), then save the output register into memory (one clock).

12 clocks for fixed and 13 for floating. So am I missing something and there is a real performance benefit or is it all just ease of use? I just used four clocks for multiply, obviously it's not going to be the same on all processors but it gives a general idea.

Upvotes: 2

Answers (2)

Patricia Shanahan

Reputation: 26185

For modern processors, it is not very useful to look at performance of individual instructions. Many are both pipelined and superscalar, so multiple instructions are at various stages of being executed at the same time.

Calculating performance from processor specifications and code is extremely difficult. I would generally only do it to estimate performance of a processor that does not yet exist. If the processor and the code both exist, it is much easier to measure.

Upvotes: 1

FooF

Reputation: 4462

Using fixed point numbers 1.54 would be already stored in a proper format for you. You would then need to just do integer multiplication followed by constant addition to compensate for rounding errors and some sifting to get the base correct. See Wikipedia:Q number format.

E.g. assuming Q15.16 format (15 bits for integer, 16 bits for the fractional parts), you could perform the multiplication like this:

int32_t a = 555 * (1 << 16);  /* constant */
int32_t b = 1.54 * (1 << 16); /* constant, no calculations needed at run time */

int64_t temp = ((((int64_t) a) * b) + (1 << 15)) >> 16;
int32_t result = (int32_t) temp;

No division or other calculation is needed to get 1.54 into b (unless you need to read input from outside). With your assumed CPU two loads, double precision multiply, addition by constant (already in register if you perform many multiplications at once), and shift.

With dedicated hardware (e.g. DSP cores or instructions extensions) some Qn.m types would be directly supported, also in vectorized fashion. Though with higher level CPUs (modern Intels or AMDs), I believe there is probably not that much benefit using fixed points as floating point instructions are already very efficient.

Upvotes: 1

Floating vs Fixed point numbers and performance

Answers (2)

Related Questions