Reputation: 75
simd pragma can be used with icc compiler to perform a reduction operator:
#pragma simd
#pragma simd reduction(+:acc)
#pragma ivdep
for(int i( 0 ); i < N; ++i )
{
acc += x[i];
}
Is there any equivalent solution in msvc or/and gcc?
Ref(p28): http://d3f8ykwhia686p.cloudfront.net/1live/intel/CompilerAutovectorizationGuide.pdf
Upvotes: 3
Views: 1577
Reputation: 620
gcc requires -ffast-math to enable this optimization (as mentioned in the reference given above), regardless of use of #pragma omp simd reduction. icc is becoming less reliant on pragma for this optimization (except that /fp:fast is needed in absence of pragma), but the extra ivdep and simd pragmas in the original post are undesirable. icc may do bad things when given a pragma simd which doesn't include all relevant reduction, firstprivate, and lastprivate clauses (and gcc may break with -ffast-math, particularly in combination with -march or -mavx). msvc 2012/2013 are very limited in auto-vectorization. There are no simd reductions, no vectorization within OpenMP parallel regions, no vectorization of conditionals, and no advantage is taken of __restrict in vectorizations (there is some run-time check to vectorize less efficiently but safely without __restrict).
Upvotes: 1
Reputation: 75
For Visual Studio 2012:
With options /O1 /O2/GL
, to report vectorization use /Qvec-report:(1/2)
int s = 0;
for ( int i = 0; i < 1000; ++i )
{
s += A[i]; // vectorizable
}
In the case of reductions over "float
" or "double
" types, vectorization requires that the /fp:fast
switch is thrown. This is because vectorizing the reduction operation depends upon "floating point reassociation". Reassociation is only allowed when /fp:fast
is thrown
Ref(associated doc;p12) http://blogs.msdn.com/b/nativeconcurrency/archive/2012/07/10/auto-vectorizer-in-visual-studio-11-cookbook.aspx
Upvotes: 3
Reputation: 7009
GCC definitely can vectorize. Suppose you have file reduc.c with contents:
int foo(int *x, int N)
{
int acc, i;
for( i = 0; i < N; ++i )
{
acc += x[i];
}
return acc;
}
Compile it (I used gcc 4.7.2) with command line:
$ gcc -O3 -S reduc.c -ftree-vectorize -msse2
Now you can see vectorized loop in assembler.
Also you may switch on verbose vectorizer output say with
$ gcc -O3 -S reduc.c -ftree-vectorize -msse2 -ftree-vectorizer-verbose=1
Now you will get console report:
Analyzing loop at reduc.c:5
Vectorizing loop at reduc.c:5
5: LOOP VECTORIZED.
reduc.c:1: note: vectorized 1 loops in function.
Look at the official docs to better understand cases where GCC can and cannot vectorize.
Upvotes: 2