Is this hardware-level parallel multiplication algorithm correct?

Question

I'm an undergraduate student studying a section in our computer architecture textbook that introduces a hardware-level parallel multiplication algorithm. As depicted in the figure, the algorithm involves a sequential peeling process during the addition phase, where bits are "peeled off" from both front and back in the order of 1, 1, 2, 4, 8, 16, going directly to the output from a higher level of the tree.

This suggests that we can accomplish 64-bit multiplication after log_2(64) iterations with just a 64-bit adder.

While I understand the "peeling off" process for the lower bits, I'm unsure if this process is equally valid for the higher bits. Should we take into account the effect of carry generation? This leads me to suspect that there might be some inaccuracies in this book's content.

To illustrate my point, consider the multiplication of 4'b1111 and 4'b1000. The result should be 8'b01111000, but it seems impossible to peel off a '0' in the first step in this case. Could someone clarify this for me?

Is this hardware-level parallel multiplication algorithm correct?

Answers (1)

Related Questions