laptou
laptou

Reputation: 7029

Matrix3x2 Performance

In my graphics application, I can represent matrices using either SharpDX.Matrix3x2 or System.Numerics.Matrix3x2. However, upon running both matrices through a performance test, I found that SharpDX's matrices handily defeat System.Numerics.Matrix3x2 by a margin of up to 70% in terms of time. My test was a pretty simple repeated multiplication, here's the code:

 var times1 = new List<float>();

for (var i = 0; i < 100; i++)
{
    var sw = Stopwatch.StartNew();

    var mat = SharpDX.Matrix3x2.Identity;

    for (var j = 0; j < 10000; j++)
        mat *= SharpDX.Matrix3x2.Rotation(13);

    sw.Stop();

    times1.Add(sw.ElapsedTicks);
}

var times2 = new List<float>();

for (var i = 0; i < 100; i++)
{
    var sw = Stopwatch.StartNew();

    var mat = System.Numerics.Matrix3x2.Identity;

    for (var j = 0; j < 10000; j++)
        mat *= System.Numerics.Matrix3x2.CreateRotation(13);

    sw.Stop();

    times2.Add(sw.ElapsedTicks);
}

TestContext.WriteLine($"SharpDX: {times1.Average()}\nSystem.Numerics: {times2.Average()}");

I ran these tests on an Intel i5-6200U processor.

Now, my question is, how can SharpDX's matrices possibly be faster? Isn't System.Numerics.Matrix3x2 supposed to utilise SIMD instructions to execute faster?

The implementation of SharpDX.Matrix3x2 is available here, and as you can see, it's written in plain C#.

Upvotes: 0

Views: 729

Answers (2)

ErnieDingo
ErnieDingo

Reputation: 444

There are a few other things you need to consider also with the testing. These are just side notes, and wont affect your current results. I've done some testing like this also.

Some corresponding functions in Sharpdx pass by object, not reference, there are corresponding by reference functions you might want to play with. You've used the operators in your testing (all fine, its a comparable test!). Just in some situations, use of operators is slower than the by reference functions.

Upvotes: 0

laptou
laptou

Reputation: 7029

It turns out that my testing logic was flawed - I was creating the rotation matrix inside the loop, which meant that I was testing the creation of rotation matrices and multiplication. I revised my testing code to look like this:

var times1 = new List<float>();

for (var i = 0; i < 100; i++)
{
    var sw = Stopwatch.StartNew();

    var mat = SharpDX.Matrix3x2.Identity;

    var s = SharpDX.Matrix3x2.Scaling(13);
    var r = SharpDX.Matrix3x2.Rotation(13);
    var t = SharpDX.Matrix3x2.Translation(13, 13);

    for (var j = 0; j < 10000; j++)
    {
        mat *= s;
        mat *= r;
        mat *= t;
    }

    sw.Stop();

    times1.Add(sw.ElapsedTicks);
}

var times2 = new List<float>();

for (var i = 0; i < 100; i++)
{
    var sw = Stopwatch.StartNew();

    var mat = System.Numerics.Matrix3x2.Identity;

    var s = System.Numerics.Matrix3x2.CreateScale(13);
    var r = System.Numerics.Matrix3x2.CreateRotation(13);
    var t = System.Numerics.Matrix3x2.CreateTranslation(13, 13);

    for (var j = 0; j < 10000; j++)
    {
        mat *= s;
        mat *= r;
        mat *= t;
    }

    sw.Stop();

    times2.Add(sw.ElapsedTicks);
}

So that the only thing performed inside the loop was multiplication, and I began to receive results indicating better performance from System.Numerics.Matrix3x2.

Another point: I didn't pay attention to the fact that SIMD optimisations only take effect in 64-bit code. These are my test results before and after changing the platform to x64:

Platform Target | System.Numerics.Matrix3x2 | SharpDX.Matrix3x2
---------------------------------------------------------------
AnyCPU          | 168ms                     | 197ms
x64             | 1.40ms                    | 1.43ms

When I check Environment.Is64BitProcess under AnyCPU, it returns false - and the "Prefer 32-Bit" box in Visual Studio is greyed out, so I suspect that AnyCPU is just an alias for x86 in this case, which explains why the test is 2 orders of magnitude faster under x64.

Upvotes: 1

Related Questions