Reputation: 85
I'm working on a performance sensitive application and considering moving from .NET 6 to .NET 7.
During comparing these two versions I've found that .NET 7 is slower executing a for loop on the initial run.
Testing is done with two separate console applications with identical code, one on .NET 6 and the other on .NET 7, running in release mode, any CPU.
Test code:
using System.Diagnostics;
int size = 1000000;
Stopwatch sw = new();
//create array
float[] arr = new float[size];
for (int i = 0; i < size; i++)
arr[i] = i;
Console.WriteLine(AppDomain.CurrentDomain.SetupInformation.TargetFrameworkName);
Console.WriteLine($"\nForLoop1");
ForLoop1();
ForLoop1();
ForLoop1();
ForLoop1();
ForLoop1();
Console.WriteLine($"\nForLoopArray");
ForLoopArray();
ForLoopArray();
ForLoopArray();
ForLoopArray();
ForLoopArray();
Console.WriteLine($"\nForLoop2");
ForLoop2();
ForLoop2();
ForLoop2();
ForLoop2();
ForLoop2();
void ForLoop1()
{
sw.Restart();
int sum = 0;
for (int i = 0; i < size; i++)
sum++;
sw.Stop();
Console.WriteLine($"{sw.ElapsedTicks} ticks ({sum})");
}
void ForLoopArray()
{
sw.Restart();
float sum = 0f;
for (int i = 0; i < size; i++)
sum += arr[i];
sw.Stop();
Console.WriteLine($"{sw.ElapsedTicks} ticks ({sum})");
}
void ForLoop2()
{
sw.Restart();
int sum = 0;
for (int i = 0; i < size; i++)
sum++;
sw.Stop();
Console.WriteLine($"{sw.ElapsedTicks} ticks ({sum})");
}
The console output for the .NET 6 version:
.NETCoreApp,Version=v6.0
ForLoop1
2989 ticks (1000000)
2846 ticks (1000000)
2851 ticks (1000000)
3180 ticks (1000000)
2841 ticks (1000000)
ForLoopArray
8270 ticks (4.9994036E+11)
8443 ticks (4.9994036E+11)
8354 ticks (4.9994036E+11)
8952 ticks (4.9994036E+11)
8458 ticks (4.9994036E+11)
ForLoop2
2842 ticks (1000000)
2844 ticks (1000000)
3117 ticks (1000000)
2835 ticks (1000000)
2992 ticks (1000000)
And the .NET 7 version:
.NETCoreApp,Version=v7.0
ForLoop1
19658 ticks (1000000)
2921 ticks (1000000)
2967 ticks (1000000)
3190 ticks (1000000)
3722 ticks (1000000)
ForLoopArray
20041 ticks (4.9994036E+11)
8342 ticks (4.9994036E+11)
9212 ticks (4.9994036E+11)
8501 ticks (4.9994036E+11)
9726 ticks (4.9994036E+11)
ForLoop2
14016 ticks (1000000)
3008 ticks (1000000)
2885 ticks (1000000)
2882 ticks (1000000)
2888 ticks (1000000)
As you can see, the .NET 6 timings are very similar, whereas the .NET 7 timings show an initial high value (19658, 20041 and 14016).
Fiddling with the environment variables DOTNET_ReadyToRun and DOTNET_TieredPGO just makes things worse.
Why is this and how can it be rectified?
Upvotes: 3
Views: 409
Reputation: 141845
My guess would be that this can be connected to the new On-Stack Replacement feature introduced in .NET 7. Enabling DOTNET_JitDisasmSummary
"on my machine" (Windows Powershell - $env:DOTNET_JitDisasmSummary=1
) results in the following output:
ForLoop1
9: JIT compiled Program:<<Main>$>g__ForLoop1|0_0(byref) [Tier0, IL size=118, code size=291]
10: JIT compiled Program:<<Main>$>g__ForLoop1|0_0(byref) [Tier1-OSR @0x19, IL size=118, code size=571]
13420 ticks (1000000)
2431 ticks (1000000)
...
ForLoopArray
11: JIT compiled Program:<<Main>$>g__ForLoopArray|0_1(byref) [Tier0, IL size=129, code size=339]
12: JIT compiled Program:<<Main>$>g__ForLoopArray|0_1(byref) [Tier1-OSR @0x24, IL size=129, code size=609]
13: JIT compiled System.SpanHelpers:SequenceCompareTo(byref,int,byref,int) [Tier1, IL size=632, code size=329]
19380 ticks (4.9994036E+11)
10694 ticks (4.9994036E+11)
...
ForLoop2
14: JIT compiled Program:<<Main>$>g__ForLoop2|0_2(byref) [Tier0, IL size=118, code size=291]
15: JIT compiled Program:<<Main>$>g__ForLoop2|0_2(byref) [Tier1-OSR @0x19, IL size=118, code size=549]
11720 ticks (1000000)
2549 ticks (1000000)
...
Setting DOTNET_TC_QuickJitForLoops
to 0 (env:DOTNET_TC_QuickJitForLoops=0
) "reverts" this behaviour (not sure why, because the docs state that default is false
, maybe something was changed in .NET 7):
ForLoop1
8: JIT compiled Program:<<Main>$>g__ForLoop1|0_0(byref) [Tier-0 switched to FullOpts, IL size=118, code size=577]
2590 ticks (1000000)
2535 ticks (1000000)
...
ForLoopArray
9: JIT compiled Program:<<Main>$>g__ForLoopArray|0_1(byref) [Tier-0 switched to FullOpts, IL size=129, code size=618]
10: JIT compiled System.SpanHelpers:SequenceCompareTo(byref,int,byref,int) [Tier1, IL size=632, code size=329]
10759 ticks (4.9994036E+11)
10816 ticks (4.9994036E+11)
...
ForLoop2
11: JIT compiled Program:<<Main>$>g__ForLoop2|0_2(byref) [Tier-0 switched to FullOpts, IL size=118, code size=555]
2446 ticks (1000000)
2509 ticks (1000000)
...
Possibly related discussion on github
P.S.
If your code is performance-sensitive especially startup performance-sensitive possibly it is worth considering to look into Native AOT.
Upvotes: 5