Justin Olbrantz
Justin Olbrantz

Reputation:

Fastest Cortex M0+ Thumb 32x32=64 multiplication function?

Does anyone have (or can easily write) an optimal inline assembly function for the ARM Cortex M0+ processor in Thumb mode to multiply two 32-bit numbers and return a 64-bit number?

As the M0+ does not have long multiply, the only way this can be accomplished is through primitive multiplication, for which the compiler calls __aeabi_lmul which performs 64x64=64 multiplication in 34 instructions. I'm hoping a significantly faster algorithm exists, given that the inputs are only 32 bits.

Upvotes: 4

Views: 2833

Answers (2)

AShelly
AShelly

Reputation: 35540

I posted a 26 cycle version on Code Review. There are suggestions to get it down to 24 or 25 cycles there.

Upvotes: 1

old_timer
old_timer

Reputation: 71536

So are you talking about unsigned or signed multiplication? If signed then you are doing a 64x64=64 anyway not a 32x32=64. If unsigned then take the source code for the gcc library function and modify it since you know that the upper halves of the operands are zero.

Or look at Hackers Delight (hackersdelight.org) and see if there is an algorithm that implements faster than the gcc library.

Upvotes: 0

Related Questions