Matthew Watson
Matthew Watson

Reputation: 109812

Code runs 30% slower in .Net Core compared to .Net Framework - any way to speed things up?

Background

We are currently going though the process of converting our codebase from .Net Framework 4.8 to .Net Core 3.1.

Some of the code is very performance-sensitive. One example is some code that applies a Hamming window filter; I was somewhat dismayed to discover that the .Net Core 3.1-compiled code runs around 30% more slowly than the same code compiled for .Net Framework 4.8.

To reproduce

I created a multitargeted SDK-style project as follows:

<Project Sdk="Microsoft.NET.Sdk">
  <PropertyGroup>
    <OutputType>Exe</OutputType>
    <TargetFrameworkS>net48;netcoreapp3.1</TargetFrameworkS>
    <Optimize>true</Optimize>
  </PropertyGroup>
  <PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|AnyCPU'">
    <PlatformTarget>x86</PlatformTarget>
  </PropertyGroup>
</Project>

The code for this project is as follows (the important code is inside the for (int iter = ... loop):

using System;
using System.Diagnostics;

namespace FooBar
{
    class Program
    {
        static void Main()
        {
#if NET48
            Console.WriteLine("NET48: Is 64 bits = " + Environment.Is64BitProcess);
#elif NETCOREAPP3_1
            Console.WriteLine("NETCOREAPP3_1: Is 64 bits = " + Environment.Is64BitProcess);
#else
            Invalid build, so refuse to compile.
#endif
            double[] array = new double[100_000_000];
            var sw = Stopwatch.StartNew();

            for (int trial = 0; trial < 100; ++trial)
            {
                sum(array);
            }

            Console.WriteLine("Average ms for calls to sum() = " + sw.ElapsedMilliseconds/100);
            Console.ReadLine();
        }

        static double sum(double[] array)
        {
            double s = 0;

            for (int i = 0; i < array.Length; ++i)
            {
                s += array[i];
            }

            return s;
        }
    }
}

Results

Timing a release x86 build for .Net Core 3.1 and .Net Framework 4.8 I get the following results:

.Net Core 3.1:

NETCOREAPP3_1: Is 64 bits = False
Average ms for calls to sum() = 122

.Net Framework 4.8:

NET48: Is 64 bits = False
Average ms for calls to sum() = 96

Thus the .Net Core 3.1 results are around 30% slower than .Net Framework 4.8.

NOTE: This only affects the x86 build. For an x64 build, the times are similar between .Net Framework and .Net Core.

I find this most disappointing, particularly since I thought that .Net Core would be likely to have better optimization ...

Can anyone suggest a way to speed up the .Net Core output so that it is in the same ballpark as .Net Framework 4.8?


[EDIT] I've updated the code and the .csproj to the latest version I'm using for testing. I added some code to indicate which target and platform is running, just to be certain the right version is being run.

With this edit, I am basically just timing how long it takes to sum all 100,000,000 elements of a large double[] array.

I can reproduce this on both my PCs and my laptop, which are running the latest Windows 10 and Visual Studio 2019 installations + latest .Net Core 3.1.

However, given that other people cannot reproduce this, I will take Lex Li's advice and post this on the Microsoft github page.

Upvotes: 8

Views: 6573

Answers (2)

aepot
aepot

Reputation: 4824

Cannot reproduce.

Looks like .NET Core 3.1 is faster at least for x86. I checked it 5 or more times for each build and the Output is nearly the same.

.NET Framework 4.8

Is 64 bits = False
Computed 4199,58 in 00:00:01.2679838
Computed 4199,58 in 00:00:01.1270864
Computed 4199,58 in 00:00:01.1163893
Computed 4199,58 in 00:00:01.1271687

Is 64 bits = True
Computed 4199,58 in 00:00:01.0910610
Computed 4199,58 in 00:00:00.9695353
Computed 4199,58 in 00:00:00.9601170
Computed 4199,58 in 00:00:00.9696420

.NET Core 3.1

Is 64 bits = False
Computed 4199,580000000003 in 00:00:00.9852276
Computed 4199,580000000003 in 00:00:00.9493986
Computed 4199,580000000003 in 00:00:00.9562083
Computed 4199,580000000003 in 00:00:00.9467359

Is 64 bits = True
Computed 4199,580000000003 in 00:00:01.0199652
Computed 4199,580000000003 in 00:00:00.9763987
Computed 4199,580000000003 in 00:00:00.9612935
Computed 4199,580000000003 in 00:00:00.9815544

Updated with new sample

NET48: Is 64 bits = False
Average ms for calls to sum() = 110

NETCOREAPP3_1: Is 64 bits = False
Average ms for calls to sum() = 110

Hardware

Intel(R) Core(TM) i7-4700HQ CPU @ 2.40GHz

Base speed: 2,40 GHz
Sockets:    1
Cores:  4
Logical processors: 8
Virtualization: Enabled
L1 cache:   256 KB
L2 cache:   1,0 MB
L3 cache:   6,0 MB

Bonus

If the code is so performance-sensitive, maybe SIMD may help.

using System.Numerics;
const int ITERS = 100000;

int vectorSize = Vector<double>.Count;
Console.WriteLine($"Vector size = {vectorSize}");
            
for (int trial = 0; trial < 4; ++trial)
{
    double windowSum = 0;
    sw.Restart();
               
    for (int iter = 0; iter < ITERS; ++iter)
    {
        Vector<double> accVector = Vector<double>.Zero;
        for (int i = 0; i <= window.Length - vectorSize; i += vectorSize)
        {
            Vector<double> v = new Vector<double>(window, i);
            accVector += Vector.Abs(v);
        }
        windowSum = Vector.Dot(accVector, Vector<double>.One);
    }
               
    Console.WriteLine($"Computed {windowSum} in {sw.Elapsed}");
}

Awesomeness of .NET Core is here :)

.NET Core 3.1

Is 64 bits = False
Vector size = 4
Computed 4199,58 in 00:00:00.3678926
Computed 4199,58 in 00:00:00.3046166
Computed 4199,58 in 00:00:00.2910941
Computed 4199,58 in 00:00:00.2900221

Is 64 bits = True
Vector size = 4
Computed 4199,58 in 00:00:00.3446433
Computed 4199,58 in 00:00:00.2616570
Computed 4199,58 in 00:00:00.2606452
Computed 4199,58 in 00:00:00.2582038

Upvotes: 4

Blindy
Blindy

Reputation: 67487

Well, I gave it a try, and I included .Net5 as well, and as expected they're pretty much identical in performance.

I would take this as a sign to use more rigorous testing methodologies (Benchmark.NET), because at this point I'm positive you're not running the correct executable, and Benchmark.NET takes care of that for you.

C:\Users\_\source\repos\ConsoleApp3\ConsoleApp3\bin\Release\net48>ConsoleApp3.exe
Computed 4199.58 in 00:00:01.0134120
Computed 4199.58 in 00:00:01.0136130
Computed 4199.58 in 00:00:01.0163664
Computed 4199.58 in 00:00:01.0161655

C:\Users\_\source\repos\ConsoleApp3\ConsoleApp3\bin\Release\net5>ConsoleApp3
Computed 4199.580000000003 in 00:00:01.0269673
Computed 4199.580000000003 in 00:00:01.0214385
Computed 4199.580000000003 in 00:00:01.0295102
Computed 4199.580000000003 in 00:00:01.0241006

C:\Users\_\source\repos\ConsoleApp3\ConsoleApp3\bin\Release\netcoreapp3.1>ConsoleApp3
Computed 4199.580000000003 in 00:00:01.0234075
Computed 4199.580000000003 in 00:00:01.0216327
Computed 4199.580000000003 in 00:00:01.0227448
Computed 4199.580000000003 in 00:00:01.0328213

Upvotes: 0

Related Questions