I’m trying to figure out what are some loop optimizations (in compilers) that can be applied to (linear algebra) vector summation? Vector sum is defined for i in range(len(vector1)): vector_result[i] = vector1[i] + vector2[i] How can I optimize this? Some loop optimizations I’m aware of: Loop fission Loop fusion Vectorization Tiling But I don’t know how any of these can be applied to the summation of vectors. If anybody out there know a thing or two about compilers and loop optimizations i will appreciate if you give me an example of the code after a loop optimization is applied. Thanks

compiler-constructioncompiler-optimization

Reputation: 149

What are some loop optimizations that can be applied to vector summation?

I’m trying to figure out what are some loop optimizations (in compilers) that can be applied to (linear algebra) vector summation?

Vector sum is defined

for i in range(len(vector1)):
     vector_result[i] = vector1[i] + vector2[i]

How can I optimize this?

Some loop optimizations I’m aware of:

Loop fission
Loop fusion
Vectorization
Tiling But I don’t know how any of these can be applied to the summation of vectors.

If anybody out there know a thing or two about compilers and loop optimizations i will appreciate if you give me an example of the code after a loop optimization is applied. Thanks

Upvotes: 0

Answers (2)

chez93

Reputation: 149

So, I found this beautiful of a paper by Frances Allen:

paper

Based on her explanation I think I can unroll the vector sum loop by two or four, etc.

here's an example of unrolling the loop by two.

for i in range(0, len(v), 2):
    vector_result[i] = v[i] + v[i]
    vector_result[i+1] = v[i+1] + v[i+1]

Upvotes: 0

bolov

Reputation: 75697

Note: This answer assumes sum = is a typo and the OP meant sum +=

The most important optimizations here are loop unrolling and vectorization.

The pseudocode would look like this:

Some notations and assumptions:

v[i : i + 4] represents a SIMD element, sum_4 again a SIMD element.
16 bytes alignment requirement

i = 0
sum_4 = 0..0;

// header
while vec1 + i !aligned at 16 bytes:
    sum_4[0] += vec1[i] + vec2[i];
    ++i

// simd - can be further unrolled with multiple vector registers
while i + 4 < len(vec1)
    sum_4 = vec1[i : i + 4] + vec2[i : i + 4]
    i += 4

// footer
while i < len(vec1)
    sum_4[0] = vec1[i] + vec2[i]
    ++i

// aggregate
int sum = horizontal_add(sum_4)

Upvotes: 1

What are some loop optimizations that can be applied to vector summation?

Answers (2)

Related Questions