Reputation: 58
I am attempting to vectorize the inner loop of my alignment function, and have run into a problem I do not understand. When two elements that are sequential in the input array are compared, the loop does not vectorize, but when the elements being compared are offset by 2, it successfully vectorizes. A minimal example:
int *vec_test(int *input) {
int i, n1, n2;
int *out = (int *) malloc(100 * sizeof(int));
// This loop fails to vectorize
for(i=1;i<100;i++) {
n1 = input[i-1];
n2 = input[i];
out[i] = n1 > n2 ? n1 : n2;
}
// This loop successfully vectorizes
for(i=1;i<100;i++) {
n1 = input[i-1];
n2 = input[i+1];
out[i] = n1 > n2 ? n1 : n2;
}
return(out);
}
When I use clang to compile this code (clang++ -O2 -Rpass=loop-vectorize -Rpass-analysis=loop-vectorize -c minimal.cpp) the second loop vectorizes, but the first loop does not.
minimal.cpp:17:17: remark: loop not vectorized: value that could not be identified as reduction is used outside the loop
minimal.cpp:23:3: remark: vectorized loop (vectorization factor: 4, unrolling interleave factor: 1) [-Rpass=loop-vectorize]
The only difference is that the elements being compared are consecutive in the first loop, and offset by 2 in the second loop. Why does the first loop fail to vectorize?
Edit: Replacing the ints with different width types (int64_t, int32_t, or int16_t) yields the same results in all cases: the bottom loop vectorizes, the top loop fails to do so.
Upvotes: 2
Views: 1753
Reputation: 14619
This failure looks like it was a bug in clang ~3.8 that has been addressed by 3.9.0.
$ clang++ -O2 -Rpass=loop-vectorize -Rpass-analysis=loop-vectorize -c minimal.cpp
minimal.cpp:8:3: remark: the cost-model indicates that interleaving is not beneficial [-Rpass-analysis=loop-vectorize]
for(i=1;i<100;i++) {
^
minimal.cpp:8:3: remark: vectorized loop (vectorization width: 4, interleaved count: 1) [-Rpass=loop-vectorize]
minimal.cpp:15:3: remark: the cost-model indicates that interleaving is not beneficial [-Rpass-analysis=loop-vectorize]
for(i=1;i<100;i++) {
^
minimal.cpp:15:3: remark: vectorized loop (vectorization width: 4, interleaved count: 1) [-Rpass=loop-vectorize]
$ clang++ --version
clang version 3.9.0 (tags/RELEASE_390/final)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/clang-latest/bin
See also https://godbolt.org/g/Nw0kk1
Upvotes: 1