Reputation: 149
I’m trying to figure out what are some loop optimizations (in compilers) that can be applied to (linear algebra) vector summation?
Vector sum is defined
for i in range(len(vector1)):
vector_result[i] = vector1[i] + vector2[i]
How can I optimize this?
Some loop optimizations I’m aware of:
If anybody out there know a thing or two about compilers and loop optimizations i will appreciate if you give me an example of the code after a loop optimization is applied. Thanks
Upvotes: 0
Views: 159
Reputation: 149
So, I found this beautiful of a paper by Frances Allen:
Based on her explanation I think I can unroll the vector sum loop by two or four, etc.
here's an example of unrolling the loop by two.
for i in range(0, len(v), 2):
vector_result[i] = v[i] + v[i]
vector_result[i+1] = v[i+1] + v[i+1]
Upvotes: 0
Reputation: 75697
Note: This answer assumes sum =
is a typo and the OP meant sum +=
The most important optimizations here are loop unrolling and vectorization.
The pseudocode would look like this:
Some notations and assumptions:
v[i : i + 4]
represents a SIMD element, sum_4
again a SIMD element.i = 0
sum_4 = 0..0;
// header
while vec1 + i !aligned at 16 bytes:
sum_4[0] += vec1[i] + vec2[i];
++i
// simd - can be further unrolled with multiple vector registers
while i + 4 < len(vec1)
sum_4 = vec1[i : i + 4] + vec2[i : i + 4]
i += 4
// footer
while i < len(vec1)
sum_4[0] = vec1[i] + vec2[i]
++i
// aggregate
int sum = horizontal_add(sum_4)
Upvotes: 1