Reputation: 1291
I having a bit of strange behavior with a method I've made when I am trying to performance test it, basically if I comment-out/disable one of the returns in one of the if statements it go from 400ms to 4ms, almost like it is being compiled away, and not actually running the code, would kind of make sense if after commenting/disable one return, it only was return true or false left so it only had one option then I can see how the compiler would optimize it and always set it as a bool rather than running the code.
Anyone know what might be going on or have recommendation for a better way to run the test?
My Test Code:
Vec3 spherePos = new Vec3(43.7527, 75.9756, 0);
double sphereRadisSq = 50 * 50;
Vec3 rayPos = new Vec3(-5.32301, 5.97157, -112.983);
Vec3 rayDir = new Vec3(0.457841, 0.680324, 0.572312);
sw.Reset();
sw.Start();
bool res = false;
for (int i = 0; i < 10000000; i++)
{
res = Intersect.RaySphereFast(rayPos, rayDir, spherePos, sphereRadisSq);
}
sw.Stop();
Debug.Log($"testTime: {sw.ElapsedMilliseconds} ms");
Debug.Log(res);
And the Static Method:
public static bool RaySphereFast(Vec3 _rp, Vec3 _rd, Vec3 _sp, double _srsq)
{
double rs = Vec3.DistanceFast(_rp, _sp);
if (rs < _srsq)
{
return (true); // <-- When I disable this one
}
Vec3 p = Vec3.ProjectFast(_sp, _rp, _rd);
double pr = Vec3.Dot(_rd, (p - _rp));
if (pr < 0)
{
return (false); // <-- Or when I disable this one
}
double ps = Vec3.DistanceFast(p, _sp);
if (ps < _srsq)
{
return (true); // <-- Or when I disable this one
}
return (false);
}
Vec3 struct (slimmed down):
public struct Vec3
{
public Vec3(double _x, double _y, double _z)
{
x = _x;
y = _y;
z = _z;
}
public double x { get; }
public double y { get; }
public double z { get; }
public static double DistanceFast(Vec3 _v0, Vec3 _v1)
{
double x = (_v1.x - _v0.x);
double y = (_v1.y - _v0.y);
double z = (_v1.z - _v0.z);
return ((x * x) + (y * y) + (z * z));
}
public static double Dot(Vec3 _v0, Vec3 _v1)
{
return ((_v0.x * _v1.x) + (_v0.y * _v1.y) + (_v0.z * _v1.z));
}
public static Vec3 ProjectFast(Vec3 _p, Vec3 _a, Vec3 _d)
{
Vec3 ap = _p - _a;
return (_a + Vec3.Dot(ap, _d) * _d);
}
public static Vec3 operator +(Vec3 _v0, Vec3 _v1)
{
return (new Vec3(_v0.x + _v1.x, _v0.y + _v1.y, _v0.z + _v1.z));
}
public static Vec3 operator -(Vec3 _v0, Vec3 _v1)
{
return new Vec3(_v0.x - _v1.x, _v0.y - _v1.y, _v0.z - _v1.z);
}
public static Vec3 operator *(double _d1, Vec3 _v0)
{
return new Vec3(_d1 * _v0.x, _d1 * _v0.y, _d1 * _v0.z);
}
}
Upvotes: 2
Views: 177
Reputation: 1112
There are a few interesting things going on here. As others have pointed out when you comment out one of the returns, the method RaySphereFast
now becomes small enough to inline, and indeed the jit decides to inline it. And this in turn inlines all of the helper methods that it calls. As a result the loop body ends up with no calls.
Once that happens the jit then "struct promotes" the various Vec3
instances, and since you have initialized all the fields with constants, the jit propagates those constants and folds them at the various operations. Because of this the jit realizes that the result of the call will always be true
.
Since every iteration of the loop returns the same value the jit realizes that none of these computations in the loop are actually necessary (since the result is knownn) and deletes them all. So in the "fast" version you are timing an empty loop:
G_M52940_IG04:
BF01000000 mov edi, 1
FFC1 inc ecx
81F980969800 cmp ecx, 0x989680
7CF1 jl SHORT G_M52940_IG04
while in the "slow" version the call doesn't get inlined and none of this optimization kicks in:
G_M32193_IG04:
488D4C2478 lea rcx, bword ptr [rsp+78H]
C4617B1109 vmovsd qword ptr [rcx], xmm9
C4617B115108 vmovsd qword ptr [rcx+8], xmm10
C4617B115910 vmovsd qword ptr [rcx+16], xmm11
488D4C2460 lea rcx, bword ptr [rsp+60H]
C4617B1121 vmovsd qword ptr [rcx], xmm12
C4617B116908 vmovsd qword ptr [rcx+8], xmm13
C4617B117110 vmovsd qword ptr [rcx+16], xmm14
488D4C2448 lea rcx, bword ptr [rsp+48H]
C4E17B1131 vmovsd qword ptr [rcx], xmm6
C4E17B117908 vmovsd qword ptr [rcx+8], xmm7
C4617B114110 vmovsd qword ptr [rcx+16], xmm8
488D4C2478 lea rcx, bword ptr [rsp+78H]
488D542460 lea rdx, bword ptr [rsp+60H]
4C8D442448 lea r8, bword ptr [rsp+48H]
C4E17B101D67010000 vmovsd xmm3, qword ptr [reloc @RWD64]
E8D2F8FFFF call X:RaySphereFast(struct,struct,struct,double):bool
8BD8 mov ebx, eax
FFC7 inc edi
81FF80969800 cmp edi, 0x989680
7C95 jl SHORT G_M32193_IG04
If you are really interested in benchmarking the speed of RaySphereFast
make sure to invoke it with different or non-constant arguments on each iteration and also make sure to consume the result of each iteration.
Upvotes: 3
Reputation: 3443
Just to add an (obvious) disclaimer to the answer from @Matthew Watson
The results depend on .NET version, JIT version, etc. FYI I cannot reproduce such a difference, and results come back pretty much equivalent on my environment.
I'm using benchmarkDotNet with .NET Core 2.1.0 , see details below
// * Summary *
BenchmarkDotNet=v0.11.1, OS=Windows 10.0.17134.228 (1803/April2018Update/Redstone4)
Intel Core i7-4700MQ CPU 2.40GHz (Max: 1.08GHz) (Haswell), 1 CPU, 8 logical and 4 physical cores
Frequency=2338346 Hz, Resolution=427.6527 ns, Timer=TSC
.NET Core SDK=2.2.100-preview1-009349
[Host] : .NET Core 2.1.0 (CoreCLR 4.6.26515.07, CoreFX 4.6.26515.06), 64bit RyuJIT
DefaultJob : .NET Core 2.1.0 (CoreCLR 4.6.26515.07, CoreFX 4.6.26515.06), 64bit RyuJIT
Method | Mean | Error | StdDev |
----------------------- |---------:|----------:|----------:|
RaySphereFast_Original | 40.06 ns | 0.3693 ns | 0.3455 ns |
RaySphereFast_NoReturn | 40.46 ns | 0.0860 ns | 0.0805 ns |
// * Legends *
Mean : Arithmetic mean of all measurements
Error : Half of 99.9% confidence interval
StdDev : Standard deviation of all measurements
1 ns : 1 Nanosecond (0.000000001 sec)
// ***** BenchmarkRunner: End *****
Run time: 00:00:34 (34.86 sec), executed benchmarks: 2
// * Artifacts cleanup *
Upvotes: 1
Reputation: 109567
This is likely to be happening because when you comment-out the returns, the complexity of the method falls below the threshold at which automatic inlining is disabled.
This inlining is not visible in the generated IL - it is done by the JIT compiler.
We can test this hypothesis by decorating the method in question with a [MethodImpl(MethodImplOptions.AggressiveInlining)]
attribute.
When I tried this with your code I obtained the following results (release, x64 build):
Original code: 302 ms
First return commented out: 2 ms
Decorated with AggressiveInlining: 2 ms
The time with the first return commented out is the same as what I obtain when decorating the method with AggressiveInlining
(leaving the first return enabled).
Therefore I conclude that the hypothesis is correct.
Upvotes: 5