Reputation: 1090
I'm fairly new with SIMD and wanted to try to see if I could get GCC to vectorise a simple action for me.
So I looked at this post and wanted to do more or less the same thing. (but with gcc 5.4.0 on Linux 64bit, for a KabyLake processor)
I essentially have this function:
/* m1 = N x M matrix, m2 = M x P matrix, m3 = N x P matrix & output */
void mmul(double **m1, double **m2, double **m3, int N, int M, int P)
{
for (i = 0; i < N; i++)
for (j = 0; j < P; j++)
{
double tmp = 0.0;
for (k = 0; k < M; k++)
tmp += m1[i][k] * m2[k][j];
tmp = m3[i][j];
}
return m3;
}
Which I compile with -O2 -ftree-vectorize -msse2 -ftree-vectorizer-verbose=5
, however I don't see any message that the vectorization was done.
If anyone could help me out, that would be very much appreciated.
Upvotes: 0
Views: 1548
Reputation: 2536
There is no message for vectorization done in you command! You can use -fopt-info-vec
to turn the vectorization report on. But, do not rely on it. Compiler sometimes lies (They vectorize and report it but don't use it!) you can chek the improvements!For this purpose, you can measure the speedup. First, disable vectorization and measure the time t1. Then enable and measure the time t2. The speed up will be t1/t2 if it's bigger than 1 it says compiler improved if 1 no improvement if less than one it says compiler auto-vectorizer ruined that for you! Another way you can add -S
to your command and see the assembly codes in a separated .s
file.
NOTE: if you want to see the autovectorization power add -march=native
and delete that -msse2
.
UPDATE: When you use a variable such a N
,M
, etc. as the loop counter you might not see vectorization. Thus, you should have used constants
instead. In my experience, the matrix-matrix multiplication is vectorizable using gcc 4.8, 5.4 and 6.2
. Other compilers such as clang-LLVM
, ICC
and MSVC
vectorize it as well. As mentioned in comments if you use double
or float
datatypes you might need to use -ffast-math
which is an enabled flag in -Ofast
optimization level, to say you don't need a high-accuracy result (It's OK most of the times). Its because ompilers are more carful about floting-point operations.
Upvotes: 2