Reputation: 109812
We are currently going though the process of converting our codebase from .Net Framework 4.8 to .Net Core 3.1.
Some of the code is very performance-sensitive. One example is some code that applies a Hamming window filter; I was somewhat dismayed to discover that the .Net Core 3.1-compiled code runs around 30% more slowly than the same code compiled for .Net Framework 4.8.
I created a multitargeted SDK-style project as follows:
<Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup>
<OutputType>Exe</OutputType>
<TargetFrameworkS>net48;netcoreapp3.1</TargetFrameworkS>
<Optimize>true</Optimize>
</PropertyGroup>
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|AnyCPU'">
<PlatformTarget>x86</PlatformTarget>
</PropertyGroup>
</Project>
The code for this project is as follows (the important code is inside the for (int iter = ...
loop):
using System;
using System.Diagnostics;
namespace FooBar
{
class Program
{
static void Main()
{
#if NET48
Console.WriteLine("NET48: Is 64 bits = " + Environment.Is64BitProcess);
#elif NETCOREAPP3_1
Console.WriteLine("NETCOREAPP3_1: Is 64 bits = " + Environment.Is64BitProcess);
#else
Invalid build, so refuse to compile.
#endif
double[] array = new double[100_000_000];
var sw = Stopwatch.StartNew();
for (int trial = 0; trial < 100; ++trial)
{
sum(array);
}
Console.WriteLine("Average ms for calls to sum() = " + sw.ElapsedMilliseconds/100);
Console.ReadLine();
}
static double sum(double[] array)
{
double s = 0;
for (int i = 0; i < array.Length; ++i)
{
s += array[i];
}
return s;
}
}
}
Timing a release x86 build for .Net Core 3.1 and .Net Framework 4.8 I get the following results:
.Net Core 3.1:
NETCOREAPP3_1: Is 64 bits = False
Average ms for calls to sum() = 122
.Net Framework 4.8:
NET48: Is 64 bits = False
Average ms for calls to sum() = 96
Thus the .Net Core 3.1 results are around 30% slower than .Net Framework 4.8.
NOTE: This only affects the x86 build. For an x64 build, the times are similar between .Net Framework and .Net Core.
I find this most disappointing, particularly since I thought that .Net Core would be likely to have better optimization ...
Can anyone suggest a way to speed up the .Net Core output so that it is in the same ballpark as .Net Framework 4.8?
[EDIT] I've updated the code and the .csproj to the latest version I'm using for testing. I added some code to indicate which target and platform is running, just to be certain the right version is being run.
With this edit, I am basically just timing how long it takes to sum all 100,000,000 elements of a large double[] array.
I can reproduce this on both my PCs and my laptop, which are running the latest Windows 10 and Visual Studio 2019 installations + latest .Net Core 3.1.
However, given that other people cannot reproduce this, I will take Lex Li's advice and post this on the Microsoft github page.
Upvotes: 8
Views: 6573
Reputation: 4824
Cannot reproduce.
Looks like .NET Core 3.1 is faster at least for x86. I checked it 5 or more times for each build and the Output is nearly the same.
.NET Framework 4.8
Is 64 bits = False
Computed 4199,58 in 00:00:01.2679838
Computed 4199,58 in 00:00:01.1270864
Computed 4199,58 in 00:00:01.1163893
Computed 4199,58 in 00:00:01.1271687
Is 64 bits = True
Computed 4199,58 in 00:00:01.0910610
Computed 4199,58 in 00:00:00.9695353
Computed 4199,58 in 00:00:00.9601170
Computed 4199,58 in 00:00:00.9696420
.NET Core 3.1
Is 64 bits = False
Computed 4199,580000000003 in 00:00:00.9852276
Computed 4199,580000000003 in 00:00:00.9493986
Computed 4199,580000000003 in 00:00:00.9562083
Computed 4199,580000000003 in 00:00:00.9467359
Is 64 bits = True
Computed 4199,580000000003 in 00:00:01.0199652
Computed 4199,580000000003 in 00:00:00.9763987
Computed 4199,580000000003 in 00:00:00.9612935
Computed 4199,580000000003 in 00:00:00.9815544
NET48: Is 64 bits = False
Average ms for calls to sum() = 110
NETCOREAPP3_1: Is 64 bits = False
Average ms for calls to sum() = 110
Intel(R) Core(TM) i7-4700HQ CPU @ 2.40GHz
Base speed: 2,40 GHz
Sockets: 1
Cores: 4
Logical processors: 8
Virtualization: Enabled
L1 cache: 256 KB
L2 cache: 1,0 MB
L3 cache: 6,0 MB
If the code is so performance-sensitive, maybe SIMD may help.
using System.Numerics;
const int ITERS = 100000;
int vectorSize = Vector<double>.Count;
Console.WriteLine($"Vector size = {vectorSize}");
for (int trial = 0; trial < 4; ++trial)
{
double windowSum = 0;
sw.Restart();
for (int iter = 0; iter < ITERS; ++iter)
{
Vector<double> accVector = Vector<double>.Zero;
for (int i = 0; i <= window.Length - vectorSize; i += vectorSize)
{
Vector<double> v = new Vector<double>(window, i);
accVector += Vector.Abs(v);
}
windowSum = Vector.Dot(accVector, Vector<double>.One);
}
Console.WriteLine($"Computed {windowSum} in {sw.Elapsed}");
}
Awesomeness of .NET Core is here :)
.NET Core 3.1
Is 64 bits = False
Vector size = 4
Computed 4199,58 in 00:00:00.3678926
Computed 4199,58 in 00:00:00.3046166
Computed 4199,58 in 00:00:00.2910941
Computed 4199,58 in 00:00:00.2900221
Is 64 bits = True
Vector size = 4
Computed 4199,58 in 00:00:00.3446433
Computed 4199,58 in 00:00:00.2616570
Computed 4199,58 in 00:00:00.2606452
Computed 4199,58 in 00:00:00.2582038
Upvotes: 4
Reputation: 67487
Well, I gave it a try, and I included .Net5 as well, and as expected they're pretty much identical in performance.
I would take this as a sign to use more rigorous testing methodologies (Benchmark.NET), because at this point I'm positive you're not running the correct executable, and Benchmark.NET takes care of that for you.
C:\Users\_\source\repos\ConsoleApp3\ConsoleApp3\bin\Release\net48>ConsoleApp3.exe
Computed 4199.58 in 00:00:01.0134120
Computed 4199.58 in 00:00:01.0136130
Computed 4199.58 in 00:00:01.0163664
Computed 4199.58 in 00:00:01.0161655
C:\Users\_\source\repos\ConsoleApp3\ConsoleApp3\bin\Release\net5>ConsoleApp3
Computed 4199.580000000003 in 00:00:01.0269673
Computed 4199.580000000003 in 00:00:01.0214385
Computed 4199.580000000003 in 00:00:01.0295102
Computed 4199.580000000003 in 00:00:01.0241006
C:\Users\_\source\repos\ConsoleApp3\ConsoleApp3\bin\Release\netcoreapp3.1>ConsoleApp3
Computed 4199.580000000003 in 00:00:01.0234075
Computed 4199.580000000003 in 00:00:01.0216327
Computed 4199.580000000003 in 00:00:01.0227448
Computed 4199.580000000003 in 00:00:01.0328213
Upvotes: 0