Multiplying arrays element-wise has unexpected performance in C#

Question

I want to find the best way to multiply two arrays element-wise. This is one part of a wider project where performance but not the only consideration.

I started writing some functions today in C# (Linqpad) and so it hasn't been optimised in any way. The output from the code below is as follows:

Environment.ProcessorCount: 4
Vector.Count: 4

For sequential: 129ms, sum: 2.30619276241231E+25
Plinq: 344ms, sum: 2.30619276241231E+25
Parallel.For: 137ms, 2.30619276241231E+25
Simd sequential: 100ms, sum: 2.30619276241231E+25
Simd parallel: 761ms

This consists of the execution time for the multiplication and a sum over the results as a check. There are a few odd results here (and I'm a little rusty in C# so it could well be my code):

regular for is faster than parallel.for
plinq is very slow relative to the others - have I done something silly here?
simd is the fastest but not by much
intermittently the simd method takes a lot longer
is parallel simd even possible (bonus points for giving an implementation or explanation)?

My code is as below - there is a reference to the Nuget System.Numerics.Vector package. I'd appreciate any comments, suggestions, corrections or alternatives...

using System.Threading.Tasks;
using System.Numerics;
using System.Collections.Concurrent;

void Main()
{
    var random = new Random();

    var arraySize = 20_000_001;

    var x = new double[arraySize];
    var y = new double[arraySize];

    for (var i = 0; i < x.Length; ++i)
    {
        x[i] = random.Next();
        y[i] = random.Next();
    }

    Console.WriteLine($"Environment.ProcessorCount: {Environment.ProcessorCount}");
    Console.WriteLine($"Vector.Count: {Vector.Count}
");

    MultiplyFor(x, y);
    MultiplyPlinq(x, y);
    MultiplyParallelFor(x, y);
    MultiplySIMD(x, y);
    MultiplyParallelSIMD(x, y);

}

void MultiplyPlinq(double[] x, double[] y)
{
    var result = new double[x.Length];

    var sw = new Stopwatch();

    sw.Start();

    result = ParallelEnumerable.Range(0, x.Length).Select(i => x[i] * y[i]).ToArray();

    sw.Stop();

    Console.WriteLine($"Plinq: {sw.ElapsedMilliseconds}ms, sum: {SumCheck(result)}");
}

double SumCheck(double[] x)
{
    return Math.Round(x.Sum() , 4);
}

void MultiplyFor(double[] x, double[] y)
{
    var result = new double[x.Length];

    var sw = new Stopwatch();

    sw.Start();

    for (var i = 0; i < x.Length; ++i)
    {
        result[i] = x[i] * y[i];
    }

    sw.Stop();

    Console.WriteLine($"For sequential: {sw.ElapsedMilliseconds}ms, sum: {SumCheck(result)}");

}

void MultiplyParallelFor(double[] x, double[] y)
{
    var result = new double[x.Length];

    var sw = new Stopwatch();

    sw.Start();

    Parallel.For(0, x.Length, i =>
    {
        result[i] = x[i] * y[i];
    });

    sw.Stop();

    Console.WriteLine($"Parallel.For: {sw.ElapsedMilliseconds}ms, {SumCheck(result)}");

}

void MultiplySIMD(double[] x, double[] y)
{
    var sw = new Stopwatch();

    sw.Start();

    var result = MultiplyByVectors(x, y);

    sw.Stop();

    // 2 cores, 4 logical, 256b register
    Console.WriteLine($"Simd sequential: {sw.ElapsedMilliseconds}ms, sum: {SumCheck(result)}");
}

double[] MultiplyByVectors(double[] x, double[] y)
{
    var result = new double[x.Length];

    var vectorSize = Vector.Count;

    int i;

    for (i = 0; i < x.Length - vectorSize; i += vectorSize)
    {
        var vx = new Vector(x, i);
        var vy = new Vector(y, i);
        (vx * vy).CopyTo(result, i);
    }

    for (; i < x.Length; i++)
    {
        result[i] = x[i] * y[i];
    }

    return result;
}

void MultiplyParallelSIMD(double[] x, double[] y)
{
    var sw = new Stopwatch();

    sw.Start();

    var chunkSize = (int)(x.Length / Environment.ProcessorCount);

    Parallel.For(0, Environment.ProcessorCount, i => {

        var complete = i * chunkSize;
        var take = Math.Min(chunkSize, x.Length - i * chunkSize);
        var xSegment = x.Skip((int)complete).Take((int)take);
        var ySegment = y.Skip((int)complete).Take((int)take);
        var result = MultiplyByVectors(xSegment.ToArray(), ySegment.ToArray());

    });

    sw.Stop();

    Console.WriteLine($"Simd parallel: {sw.ElapsedMilliseconds}ms");

}

Multiplying arrays element-wise has unexpected performance in C#

Answers (1)

Related Questions