Luciano Martinez Rau
Luciano Martinez Rau

Reputation: 13

Incorrect multiplication result using fixed point in C

I'm trying to implement signed unsigned multiplication in C using fixed point arithmetic, but I get a incorrect result. I can't imagine how to solve this problem. I think there is some problem in the bit extension. Here is the piece of code:

int16_t audio_sample=0x1FF;      //format signed Q1.8 -> Value represented=-0.00390625
uint8_t gain=0xA;                //format unsigned Q5.2 -> Value represented = 2.5
int16_t result= (int16_t)(((int16_t)((int32_t)audio_sample * (int32_t) gain);
printf("%x",result);

The result from printf is 0x13F6, which is of course the result from 0x1FF*0xA, but the fixed-point arithmetics said that the correct results would be 0x3FF6, considering the proper bit-extension. 0x3FF6 in Q6.10 format represent -0.009765625=-0.00390625*2.5 .

Please help me find my mistake.

Thank in advance.

Upvotes: 1

Views: 370

Answers (2)

John McFarlane
John McFarlane

Reputation: 6087

It is best to think of fixed-point as a matter of scaling, and to express your calculation simply and clearly in terms of numbers — rather than bits. (Example)

A Q1.8 or Q5.2 number in AMD Q notation is a real number scaled by a factor of 28 or 22 respectively.

But C doesn't have 9 or 7-bit number types. Your int16_t and uint8_t variables have enough range to store such numbers. But for arithmetic operations, it is unwise to use unsigned integers, or to mix signed and unsigned types. int has enough range and avoids some efficiency pitfalls.

int audio_sample = -0.00390625*256;  // Q1.8
int gain = 2.5*4;  // Q5.2

The product of numbers scaled by 28 and 22 has a scale of 210.

int result = audio_sample * gain;  // Q6.10

To convert back to the real value, divide by the scaler.

printf("%lg * %lg = %lg\n",
    (double)audio_sample/256,
    (double)gain/4,
    (double)result/1024);

Please help me find my mistake.

The mistake was in assigning 0x1FF to audio_sample, instead of -1. 0x1FF is the unsigned truncation of the 9-bit two's-complement value -1. But audio_sample is wider and would require more leading 1 bits. It would have been clearer and safer to express your intent by assigning -0.00390625*256 to audio_sample.

the fixed-point arithmetics said that the correct results would be 0x3FF6, considering the proper bit-extension

0x3FF6 is the unsigned 14-bit truncation of the correct two's complement answer. But the result requires 16-bits so you're probably looking for value, 0xFFF6.

printf("unsigned Q6.10: 0x%x\n", (unsigned)result & 0xFFFF);

Upvotes: 0

mch
mch

Reputation: 9804

You should use unsigned types here. The representation is in your head (or the comments), not in the data types in the code.

2's complement means the 1 on the left is theoretically continued forever. e.g. 0x1FF in Q1.8 is the same as 0xFFFF in Q8.8 (-1 / 256).

If you have a 16bit integer, you cannot have Q1.8, it will always be Q8.8, the machine will not ignore the other bits. So, 0x1FF in Q1.8 should be 0xFFFF in Q8.8. The 0xA in Q5.2 do not change in Q6.2.

0xFFFF * 0xA = 0x9FFF6, cut away the overflow (therefore use unsigned) and you have 0xFFF6 in Q6.10, which is -10 / 1024, which is your expected result.

Upvotes: 4

Related Questions