Reputation: 1109
I am implementing a VHDL 8 bit fixed point multiplication module which returns an 8bit truncated number but I have a problem when I do multiplications by hand in order to test it. The problem arises when I want to multiply two negative numbers.
I tried multiplying two positive values 1.67 * 0.625 ~ 1.04(0.906 in binary multipication).
001.10101 -> 1.67
000.10100 -> 0.625
------------
000000.1110111010 = 000.11101 (truncated to 8bits = 0.906)
I tried multiplying negative and positive numbers (-0.875 * 3 ~2.62)
111.00100 -> -0.875
011.00000 -> 3
----------
010101.0110000000 = 101.01100 (truncated to 8bits = -2.625)
So far everythng is working properly. The problem comes hen I try to multiply two negative numbers. According to what I know (unless I'm mistaken): - multiplying two numbers will give a result with twice the resolution (multiply two 8 bit numbers and you get a 16 bit number) - the fixed point gets dislocated as well. In this example there are 3 bits before the fixed and 5 points after. This means that in the resulting number the fixed point will have 6 digits before the point and 10 bits after the point.
By assuming this the above calculations worked properly. But when I try to multiply two negative values (-0.875 * -1.91 ~ 1.67)
110.00010 -> -1.91 (1.9375)
111.00100 -> -0.875
------------
101011.0011001000 = 011.00110(truncated to 8 bits = 3.1875)
Naturally, I tried another negative multiplication (-2.64 * -0.875 = 2.31)
101.01011 -> -2.64
111.00100 -> -0.875
----------
100110.0001001100 = 110.00010 (truncated to 8bits = -1.9375)
Clearly I'm doing something wrong, but I just can't see what I'm doing wrong.
PS: I haven't implemented it yet. The thought came to me I figured out how I was going to do it and then I tried to test it by hand with some simple examples. And I also tried more multiplications. I thought that maybe they worked out because I was lucky, but apparently not, I tried a few more multiplications and they worked. So maybe I'm doing something wrong when multiplying two negative numbers, maybe I'm truncating it wrong? Probably.
EDIT: Ok, I found a Xilinx document that states how multiplication is made when the two operands are negative, here is the link. According to this docuent, in order to this document, this can only be done when doing extended multiplication. And the last partial sum for the multiplication must be inverted and then add 1 to it and it will result in the correct number.
In order to the multiplications I used windows' calculator in programmer mode, which means that in order to multiply the 8 bits I put the numbers in the calculator and then got the result and truncated it. If they worked for the other cases it means that the windows calculator is doing a direct multiplication (adding all the partial sums as they should be instead of inverting the last partial sum). So, this means that in order to obtain the real result I should substract the first operand from the final result and then add the first operand inverted + 1
110.00010 -> -1.91 (1.9375)
111.00100 -> -0.875
------------
101011.0011001000
Which gave me the result: 000010.0111001000 = 010.01110(truncated to 8bits =2.43)
And the with the other one I came up with the result of 1.875. Those outputs aren't exactly great, but at least they are closer to what I expected. Is there any other way to do this in an easier way?
Upvotes: 3
Views: 2868
Reputation: 3659
Your intermediate results are wrong, so that, the truncation did not work as expected. Moreover, the truncation is only possible without overflow if the four top-most bit of the intermediate result are equal in your format. You should use signed data-types to do the multiplication right.
Even your second example is wrong. The intermediate binary result 010101.0110000000
represents the decimal number 21.375 which is not the product of -0.875 and 3. So, let's do the multiplication by hand:
a * b = -0.875 * 3 = -2.625
111.00100 * 011.00000
---------------------
. 00000000 // further lines containing only zeros have been omitted
+ .01100000
+ 011.00000
+ 0110.0000
+ 110100.000 // add -(2^2) * b !
= 111101.0110000000 = -2.625 (intermediate result)
= 101.01100 = -2.625 after truncation
You have to add the two's complement of b
in the last partial sum because the '1' in the top-most bit of a
represent the value -(2^2) = -4. Truncation without overflow is possible here because the 4 top-most bits of the intermediate result are equal.
And now the third example
a b = -1.9375 * -0.875 = 1.6953125
110.00010 * 111.00100
---------------------
. 00000000 // further lines containing only zeros have been omitted
+ 111111.111100100 // sign-extended partial-sum
+ 111110.0100 // sign-extended partial-sum
+ 000011.100 // add -4 * b
= 000001.101100100 = 1.6953125 (intermediate result)
~ 001.10110 = 1.6875 after truncation
As b
is a signed number, one has always sign-extend the partial sum to the width of the intermediate result. Of course, this has also been done in the calculation of the second example, but there it does not make a difference.
Upvotes: 3