benjjneb
benjjneb

Reputation: 58

Clang fails to vectorize when comparing sequential array elements

I am attempting to vectorize the inner loop of my alignment function, and have run into a problem I do not understand. When two elements that are sequential in the input array are compared, the loop does not vectorize, but when the elements being compared are offset by 2, it successfully vectorizes. A minimal example:

int *vec_test(int *input) {
  int i, n1, n2;
  int *out = (int *) malloc(100 * sizeof(int));

  // This loop fails to vectorize
  for(i=1;i<100;i++) {
    n1 = input[i-1];
    n2 = input[i];
    out[i] = n1 > n2 ? n1 : n2;
  }

  // This loop successfully vectorizes
  for(i=1;i<100;i++) {
    n1 = input[i-1];
    n2 = input[i+1];
    out[i] = n1 > n2 ? n1 : n2;
  }

  return(out);
}

When I use clang to compile this code (clang++ -O2 -Rpass=loop-vectorize -Rpass-analysis=loop-vectorize -c minimal.cpp) the second loop vectorizes, but the first loop does not.

minimal.cpp:17:17: remark: loop not vectorized: value that could not be identified as reduction is used outside the loop

minimal.cpp:23:3: remark: vectorized loop (vectorization factor: 4, unrolling interleave factor: 1) [-Rpass=loop-vectorize]

The only difference is that the elements being compared are consecutive in the first loop, and offset by 2 in the second loop. Why does the first loop fail to vectorize?

Edit: Replacing the ints with different width types (int64_t, int32_t, or int16_t) yields the same results in all cases: the bottom loop vectorizes, the top loop fails to do so.

Upvotes: 2

Views: 1753

Answers (1)

Brian Cain
Brian Cain

Reputation: 14619

This failure looks like it was a bug in clang ~3.8 that has been addressed by 3.9.0.

$ clang++ -O2 -Rpass=loop-vectorize -Rpass-analysis=loop-vectorize -c minimal.cpp
minimal.cpp:8:3: remark: the cost-model indicates that interleaving is not beneficial [-Rpass-analysis=loop-vectorize]
  for(i=1;i<100;i++) {
  ^
minimal.cpp:8:3: remark: vectorized loop (vectorization width: 4, interleaved count: 1) [-Rpass=loop-vectorize]
minimal.cpp:15:3: remark: the cost-model indicates that interleaving is not beneficial [-Rpass-analysis=loop-vectorize]
  for(i=1;i<100;i++) {
  ^
minimal.cpp:15:3: remark: vectorized loop (vectorization width: 4, interleaved count: 1) [-Rpass=loop-vectorize]

$ clang++ --version
clang version 3.9.0 (tags/RELEASE_390/final)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/clang-latest/bin

See also https://godbolt.org/g/Nw0kk1

Upvotes: 1

Related Questions