Pepernoot
Pepernoot

Reputation: 3630

Why is looping over Array.AsSpan() faster?

|         Method |     Mean |    Error |   StdDev |
|--------------- |---------:|---------:|---------:|
|  ArrayRefIndex | 661.9 us | 12.95 us | 15.42 us |
| ArraySpanIndex | 640.4 us |  4.08 us |  3.82 us |

Why is looping over array.AsSpan() faster than looping directly over the source array?

public struct Struct16
{
    public int A;
    public int B;
    public int C;
    public int D;
}

public class Program
{
    public const int COUNT = 100000;
    
    static void Main(string[] args)
    {
        var summary = BenchmarkRunner.Run<Program>();
    }

    [Benchmark]
    public int ArrayRefIndex()
    {
        Struct16[] myArray = new Struct16[COUNT];
        int sum = 0;
        for (int i = 0; i < myArray.Length; i++)
        {
            ref Struct16 value = ref myArray[i];
            sum += value.A = value.A + value.B + value.C + value.D;
        }
        return sum;
    }

    [Benchmark]
    public int ArraySpanIndex()
    {
        Struct16[] myArray = new Struct16[COUNT];
        int sum = 0;
        Span<Struct16> span = myArray.AsSpan();
        for (int i = 0; i < span.Length; i++)
        {
            ref Struct16 value = ref span[i];
            sum += value.A = value.A + value.B + value.C + value.D;
        }
        return sum;
    }
}

Upvotes: 8

Views: 6040

Answers (1)

Joseph atkinson
Joseph atkinson

Reputation: 169

Short Answer

Span guarantees a "contiguous regions of arbitrary memory" which allows the compiler to make optimizations to the CLI instructions.

Long Answer

If you open up your provided code in Disassembly (Debug -> Windows -> Disassembly) you will find the following in the ArrayRefIndex()

ref Struct16 value = ref myArray[i];
00007FFC3E860DCC  movsxd      r8,ecx  
00007FFC3E860DCF  shl         r8,4  
00007FFC3E860DD3  lea         r8,[rax+r8+10h] // <----

the "lea" stands for Load Effective Address. Meaning, the ArrayRefIndex function is slower because it is treating the struct array as unordered memory.

When we look at ArraySpanIndex we can see that it does not have the "lea" instruction and instead its replace with just an "add." I did not confirm, but this is likely simply adding the struct length for the next memory location. Either way, the "lea" instruction is the only delta between the two functions narrowing down the culprit to the time difference.

ref Struct16 value = ref span[i];
00007FFC3E8613FA  movsxd      r8,ecx  
00007FFC3E8613FD  shl         r8,4  
00007FFC3E861401  add         r8,rax  // <----

Upvotes: 6

Related Questions