xxx222
xxx222

Reputation: 3244

Auto vectorization with Rust

I am new with rust/SIMD and I have the following code snippet as the bottleneck of my program, I am wondering if I can leverage on the autovectorization feature of it

fn is_subset(a: Vec<i64>, b: Vec<i64>) -> bool {
    for i in 0..a.len() {
        if (a[i] & !b[i]) != 0 {
            return false;
        }
    }
    true
}

I also have an alternative way for writing this (with iterator so trip count will be known up front), would this create autovectorization instead?

fn is_subset(a: Vec<i64>, b: Vec<i64>) -> bool {
    return a.iter().zip(b.iter()).all(|(x, y)| x & y == *x)
}

Upvotes: 4

Views: 5121

Answers (1)

Peter Cordes
Peter Cordes

Reputation: 365757

LLVM (and GCC) don't know how to auto-vectorize loops whose trip-count can't be calculated up front. This rules out search loops like this.

ICC classic can auto-vectorize such loops, but it's a C/C++ compiler, without a front-end for Rust.

Probably your only hope would be to manually loop over 2, 4, or 8-element chunks of the arrays, branchlessly calculating your condition based on all those elements. If you're lucky, LLVM might turn that into operations on one SIMD vector. So using that inner loop inside a larger loop could result in getting the compiler to make vectorized asm, for example using AVX vptest (which sets CF according to bitwise a AND (not b) having any non-zero bits).

i.e. manually express the "unrolling" of SIMD elements in your source, for a specific vector width.

Related re: getting compilers to auto-vectorize with x86 ptest:

If the whole array is small enough, a branchless reduction over the whole array (ORing boolean results together) could be something a compiler is willing to do something with. If compiling for x86, you'd be hoping for asm like pandn / por in the loop, and a horizontal reduction at the end, so see if any bits were set in the intersection of a and not b.

i64 is only 2 elements per 16-byte vector, so the compiler would have to see a good strategy for it to be profitable to auto-vectorize, especially on a 64-bit machine. Having 32-byte vectors available makes it more attractive.

Upvotes: 8

Related Questions