Reputation: 2783
I am writing a program for a embedded computer and have VERY little memory and processing power to work with.
y and a are doubles stored in floating-point registers, and x is an array of doubles. What is the most efficient way of writing this expression in MIPS?
y = y + a * x[i];
Upvotes: 0
Views: 465
Reputation: 16596
I'm not fluent in MIPS assembler, so I will not bother with actual MIPS instructions, I will use something like plain English halfway to z80/x86 TASM, hopefully you will get the idea.
And I will assume you want to add a whole array, not just this single line, because that changes everything about the task.
If you really want just to optimize this single line, there's little room for making it fancy. Just load x[i], multiply it by a, and add result to y.
If you are talking about some fixed size array (like size 4 in matrices), there may be some direct unrolled way to do it faster than the following thing from me.
If we are talking about some array, that's different (but you should have posted it like that), you can save many (n-1) multiplications by summing the x array first:
load r1, x_array_pointer
load r2, x_array_end_pointer
load fpr0, zero_value
:loop_sum_x_array
add fpr0,[r1]
add r1,size_of_double
cmp r1,r2
jump_less loop_sum_x_array ; till whole array is summed
mul fpr0, *a* ; now multiply sum{x} by "a"
add fpr0, *y* ; and add initial "y" value
; fpr0 contains result
"Algorithm": y + a*x0 + a*x1 + a*x2 + ... = y + a*(x0 + x1 + x2 + ...) (If you didn't figure this one on your own before you posted at SO, you either didn't even try, or you are 8y old, or you should seriously do some thinking and basic math exercises, because this is like obvious. Heh, actually, at this level of difficulty it's pure fun, why do you let others at SO live yours fun? You are very generous sir. :) )
Memory: this doesn't use any additional memory, only the input y, a and x, you need few temporary registers (r1, r2, fpr0) (so as long as you are not doing 8bit CPU exercise, you should have enough of spare ones for this).
Processing power: complexity of algorithm is O(n) (and as you have to add each value from x array, you can't beat that). The inner loop is using quite basic instructions: one floating point addition, load of double-value from memory, address incrementing, compare and conditional jump. Then it needs literally one floating point multiply and one more fp addition. The x array is accessed in sequential way, so memory cache misses should be minimal.
If your CPU has any specialized instructions like MMX, the sum for large arrays can be written probably faster by utilizing those. But on modern CPU+RAM for large arrays you will be mostly limited by memory cache speed, as that inner loop is like non-existent for GHz CPU (except loading value from memory of course).
edit: as Michael noted, using C compiler is the right way, I did my answer just for fun of writing some pseudo-assembler. I'm not sure what your platform is, but if it's worth something, there must be cross compiler for PC plus way to get the binary result to the target.
Upvotes: 0