Reputation: 919
I'm hoping to use the CH32V003 (an RV32EC processor) to do ColorChord, which makes extensive use of multiply-add's to perform DFTs. But it can operate with very low bit depths, 16- or even 8-bit multiplies. But, the RV32EC in the CH32V003 doesn't support the RV32 multiply extension.
I've tried exploring options in godbolt, see https://godbolt.org/z/zqTEaeecr to see what the compiler would do in these situations, but it seems to only call __mulsi3
, which performs a naive 32-bit multiply. https://github.com/gcc-mirror/gcc/blob/master/libgcc/config/epiphany/mulsi3.c
What I'm hoping is that there's some ultra efficient route to do something like a combined multiply-and-shift for different situations.
Is there a good guide or discussion surrounding performing extremely efficient multiplies of special combinations of bit widths and signeness for architectures that don't have hardware multiply?
Upvotes: 3
Views: 440
Reputation: 21
You've got 16kB of flash available. Why don't you use 1kB for storing a "squares/4" table such as...
const uint16_t Sqr_4[511]={0/4,1/4, 4/4, 9/4, 16/4, 25/4, ..., 260100/4};
uint16_t umul8b( uint8_t x, uint8_t y){
return Sqr_4[(uint16_t)x+y]-((x>y)?Sqr_4[x-y]:Sqr_4[y-x]);
}
Upvotes: 2