C++ Matrix Multiplication Auto-Vectorization

I have auto-vectorization enabled. When I compile the code, I receive the following warning:

info C5002: loop not vectorized due to reason '1203'

MSDN specifies that the

Loop body includes non-contiguous accesses into an array.

I've look into these links, 1, 2, for help, but have had no luck.

Here is my source code:

for (int row = 0; row < size; ++row) {
    for (int col = 0; col < size; ++col) {
        float tmp = 0;
        for (int i = 0; i < size; ++i) { // This loop generates the warning above
            tmp += matrixA[row][i] * matrixB[i][col];
        }
        matrixResult[row][col] = tmp;
    }
}

Any help is welcomed.

Upvotes: 0

Answers (4)

mostafa.elhoushi

Reputation: 5322

This worked with me:

#define N 1000

void example(int A[N][N], int B[N][N], int C[N][N]) {
    for (int i=0; i<N; i++) {
        for (int j=0; j<N; j++) {
            int res = 0;
            #pragma clang loop vectorize(enable) vectorize_width(8)
            for (int k=0; k<N; k++) {
                res += A[i][k] * B[j][k];
            }
            C[i][j] = res;
        }
    }
}

See https://godbolt.org/z/K4foqd37T

UPDATE (Thanks to @Peter Cordes): that works if one input matrix is already transposed,

Upvotes: 1

Marius D

Reputation: 78

One way to reach contiguous access: you can swap the inner two loops. Instead of for row, for col, for i you have for row, for i, for col. See the resulted code bellow. Now the access of both matrixResult and matrixB is along col, so it is contiguous.

for (int row = 0; row < size; ++row) {
    for (int i = 0; i < size; ++i) {
        int a_row_i = matrixA[row][i];
        for (int col = 0; col < size; ++col) {
            matrixResult[row][col] += a_row_i * matrixB[i][col];
        }
    }
}

Upvotes: 1

gbjbaanb

Reputation: 52679

2D arrays are stored as a single contiguous block of memory, so a 3x2 element 2D array is actually a 6 elements laid out end to end.

The [] indexing operators simply calculate which element to access.

So what's happening here is that matrixA is being accessed from element 1 through to element 6 sequentially (ie A1, A2, A3, B1, B2, B3).

matrixB however, is being accessed 'randomly', A1, B1, A2, B2 etc which maps onto the actual storage as accessing elements 1 then 4 then 2 then 5.

You can't change the order you access the elements of matrixB, but you could transpose it so that the elements are in the right order to be accessed sequentially. Obviously, if you only do this multiplication once, it might not be worth the effort to re-calculate matrixBs ordering, but if you are performing this calculation repeatedly, then the effort will be very much worth it.

Upvotes: 2

Armen Avetisyan

Reputation: 1258

If matrix A and B have the same storage order (e.g. row major), then you cannot vectorize it anyway. So that makes the warning plausible.

Just an advice here: if you want serious high performance computing then you should give up on 2D arrays. The gain in caching is way bigger than the vectorization speed up.

Upvotes: 1

C++ Matrix Multiplication Auto-Vectorization

Answers (4)

Related Questions