Fixed Point Binary Multiplication Over Flow Problem

Question

I am doing 32bit unsigned multiplication of two fixed point integers like below:

888.88 x 805.00 = 7,155,484,000 (greater then 32 bit)

But i need the result like below:
888.88 x 805.00 = 715,548.4

I am doing this in PIC16 Assembly language, I don't want to use Floating Point Routine due to Program memory limitation. Is there is any other way or trick to get the required result within 32 bits?

I need help in this issue! Please answer if possible.

njuffa · Accepted Answer

The question does not specify a fixed-point format. I will assume a binary fixed-point format, rather than a decimal one possibly based on BCD encoding. Since the example shows numbers with two decimals and the question specifies unsigned operands, I choose UQ24.8 format mapped to a uint32_t, with 24 bits allocated to the integer portion and 8 bits allocated to the fraction portion.

Each fixed-point number contains an implicit scale factor based on the number of fraction bits. For a UQ24.8 operand this scale factor is 2⁸ = 256. When we use integer multiplication to affect fixed-point multiplication, the scale factor is included in the product twice, so we need to divide by the scale factor to get the desired UQ24.8 result with the scale factor applied once. For binary fixed-point arithmetic that division is simply a right shift by the number of fraction bits.

We can also round the result of the multiplication by adding half of the scale factor prior to the division. In my experience the overhead of a rounded fixed-point multiplication is low enough on most platforms to make it worthwhile by preventing numerical drift in computations due to the biased error caused by truncation.

I am not familiar with PIC assembly language, but the ISO-C99 code below demonstrates the principle. The output of this little program should look like this:

factor             a = 000378e1    888.879
factor             b = 00032503    805.012
truncated product: p = 0aeb25ef 715557.934
rounded product:   p = 0aeb25f0 715557.938

For decimal fixed-point arithmetic the mechanics of fixed-point multiplication are the same, except that the scale factor is a power of 10, requiring a relatively expensive division rather than a simple extended-width right shift. However, this division is by a constant so it can be replaced with multiplication with the reciprocal, something that any optimizing compiler should know how to do, as a general algorithm has been known since the 1990s.

#include 
#include 
#include 
#include 
#include 

#define UQ24P8_FRACT_BITS (8)
#define UQ24P8_INTGR_BITS (24)
#define UQ24P8_SCALE_FACT (1L << UQ24P8_FRACT_BITS)
#define UQ24P8_OVFL_LIMIT (1L << UQ24P8_INTGR_BITS)

typedef uint32_t uq24p8;

uq24p8 mul_uq24p8 (uq24p8 a, uq24p8 b, int round)
{
    uint64_t p = ((uint64_t)a) * b;
    if (!round) {
        return (uq24p8)(p / UQ24P8_SCALE_FACT); // p >> UQ24P8_FRACT_BITS
    } else {
        return (uq24p8)((p + (UQ24P8_SCALE_FACT / 2)) / UQ24P8_SCALE_FACT);
    }
}

uq24p8 float_to_uq24p8 (float a)
{
    float t = roundf (a * UQ24P8_SCALE_FACT);
    if ((t < 0) || (t >= UQ24P8_OVFL_LIMIT)) {
        fprintf (stderr, "float_to_uq24p8: conversion error
");
    }
    return ((uq24p8)t);
}

double uq24p8_to_double (uq24p8 a)
{
    return ((double)a) / UQ24P8_SCALE_FACT;
}

int main (void)
{
    uq24p8 a = float_to_uq24p8 (888.88f);
    uq24p8 b = float_to_uq24p8 (805.01f);
    uq24p8 p;

    printf ("factor             a = %08" PRIx32 " %10.3f
", a, uq24p8_to_double(a));
    printf ("factor             b = %08" PRIx32 " %10.3f
", b, uq24p8_to_double(b));
    p = mul_uq24p8 (a, b, 0);
    printf ("truncated product: p = %08" PRIx32 " %10.3f
", p, uq24p8_to_double(p));
    p = mul_uq24p8 (a, b, 1);
    printf ("rounded product:   p = %08" PRIx32 " %10.3f
", p, uq24p8_to_double(p));
    return EXIT_SUCCESS;
}

Fixed Point Binary Multiplication Over Flow Problem

Answers (1)

Related Questions