SSE instruction to check if byte array is zeroes C#

Question

Suppose I have a byte[] and want to check if all bytes are zeros. For loop is an obvious way to do it, and LINQ All() is a fancy way to do it, but highest performance is critical.

How can I use Mono.Simd to speed up checking if byte array is full of zeroes? I am looking for cutting edge approach, not merely correct solution.

ArekBulski · Accepted Answer

Best code is presented below. Other methods and time measuring are available in full source.

static unsafe bool BySimdUnrolled (byte[] data)
{
    fixed (byte* bytes = data) {
        int len = data.Length;
        int rem = len % (16 * 16);
        Vector16b* b = (Vector16b*)bytes;
        Vector16b* e = (Vector16b*)(bytes + len - rem);
        Vector16b zero = Vector16b.Zero;

        while (b < e) {
            if ((*(b) | *(b + 1) | *(b + 2) | *(b + 3) | *(b + 4) |
                *(b + 5) | *(b + 6) | *(b + 7) | *(b + 8) |
                *(b + 9) | *(b + 10) | *(b + 11) | *(b + 12) | 
                *(b + 13) | *(b + 14) | *(b + 15)) != zero)
                return false;
            b += 16;
        }

        for (int i = 0; i < rem; i++)
            if (data [len - 1 - i] != 0)
                return false;

        return true;
    }
}

Eventually it was beaten by this code:

static unsafe bool ByFixedLongUnrolled (byte[] data)
{
    fixed (byte* bytes = data) {
        int len = data.Length;
        int rem = len % (sizeof(long) * 16);
        long* b = (long*)bytes;
        long* e = (long*)(bytes + len - rem);

        while (b < e) {
            if ((*(b) | *(b + 1) | *(b + 2) | *(b + 3) | *(b + 4) |
                *(b + 5) | *(b + 6) | *(b + 7) | *(b + 8) |
                *(b + 9) | *(b + 10) | *(b + 11) | *(b + 12) | 
                *(b + 13) | *(b + 14) | *(b + 15)) != 0)
                return false;
            b += 16;
        }

        for (int i = 0; i < rem; i++)
            if (data [len - 1 - i] != 0)
                return false;

        return true;
    }
}

Time measurements (on 256MB array):

LINQ All(b => b == 0)                   : 6350,4185 ms
Foreach over byte[]                     : 580,4394 ms
For with byte[].Length property         : 809,7283 ms
For with Length in local variable       : 407,2158 ms
For unrolled 16 times                   : 334,8038 ms
For fixed byte*                         : 272,386 ms
For fixed byte* unrolled 16 times       : 141,2775 ms
For fixed long*                         : 52,0284 ms
For fixed long* unrolled 16 times       : 25,9794 ms
SIMD Vector16b equals Vector16b.Zero    : 56,9328 ms
SIMD Vector16b also unrolled 16 times   : 32,6358 ms

Conclusions:

Mono.Simd has only a limited set of instructions. I found no instructions for computing scalar sum(vector) or max(vector). There is however vector equality operator returning bool.
Loop unrolling is a powerful technique. Even fastest code benefits a lot from using it.
LINQ is ungodly slow because it uses delegates from lambda expressions. If you need cutting edge performance then clearly that is not the way to go.
All methods presented use short circuit evaluation, meaning they end as soon as they encounter non-zero.
SIMD code was eventually beaten. There are other questions on SO disputing whether SIMD actually makes things faster.

Posted this code on Peer Review, so far 2 bugs found and fixed.

SSE instruction to check if byte array is zeroes C#

Answers (2)

Related Questions