Reputation: 5255
I came across the question Wrapping real numbers which asks for help on implementing a "wrap around" functionality for double
values within a given range, stating the following constraint:
Note: I must not use if-else or ?: (conditional operator) just for performance reasons. When I am working with thousands of particles, checking for each coordinate for each axis reduces performance considerably.
The assumption seems to be that branch prediction would incur a significant penalty on performance. Andrew Morton asks
Have you tried it with an
if
? It could be that the CPU branch predictor gets it right often enough that the code isn't noticeably slowed down.
Well, I did:
I created a second variant of the method WrapPositiveWithBranches
that uses if
and compared it to the existing one in a benchmark project, herein named WrapPositiveWithoutBranches
.
To see the effect of branch prediction, the methods are invoked with different data. There´s fixedData
, which should always hit the same branch (with the value being below the range) and randomData
, which covers all branches in random order.
Here are the results
Method | Mean | Error | StdDev |
---|---|---|---|
WithoutBranches_randomData | 85.68 ms | 0.750 ms | 0.922 ms |
WithBranches_randomData | 84.36 ms | 0.427 ms | 0.474 ms |
WithoutBranches_fixedData | 84.83 ms | 0.954 ms | 1.620 ms |
WithBranches_fixedData | 84.80 ms | 0.965 ms | 0.947 ms |
To me it looks like there´s no significant difference in performance between the four benchmarked scenarios.
Why do the benchmark results not show a branch prediction penalty?
I do not write benchmarks very often and I know that it´s pretty easy to make mistakes. Is there anything wrong with the benchmark code that I wrote?
using System;
using System.Linq;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
namespace WrapAroundBenchmark;
public class Program
{
static void Main(string[] args)
{
BenchmarkRunner.Run<WrapAround>();
}
public class WrapAround
{
private const double PeriodicDistance = 10d;
private const double Within = 4d;
private const double Below = -1d;
private const double Above = 15d;
private const int N = 30_0000_000;
private readonly double[] randomData;
private readonly double[] fixedData;
public WrapAround()
{
var random = new Random(42);
randomData = Enumerable.Empty<double>().Concat(Enumerable.Repeat(Below, N / 3))
.Concat(Enumerable.Repeat(Within, N / 3))
.Concat(Enumerable.Repeat(Above, N / 3))
.ToArray();
random.Shuffle(randomData);
fixedData = Enumerable.Repeat(Below, N).ToArray();
}
[Benchmark]
public void WithoutBranches_randomData()
{
foreach (var number in randomData)
{
WrapPositiveWithoutBranches(PeriodicDistance, number);
}
}
[Benchmark]
public void WithBranches_randomData()
{
foreach (var number in randomData)
{
WrapPositiveWithBranches(PeriodicDistance, number);
}
}
[Benchmark]
public void WithoutBranches_fixedData()
{
foreach (var number in fixedData)
{
WrapPositiveWithoutBranches(PeriodicDistance, number);
}
}
[Benchmark]
public void WithBranches_fixedData()
{
foreach (var number in fixedData)
{
WrapPositiveWithBranches(PeriodicDistance, number);
}
}
private static double WrapPositiveWithoutBranches(double periodicDistance, double position)
{
return (position % periodicDistance + periodicDistance) % periodicDistance;
}
private static double WrapPositiveWithBranches(double periodicDistance, double position)
{
if (position < 0) return 0;
if (position >= periodicDistance) return position % periodicDistance;
return position;
}
}
}
Upvotes: 0
Views: 78
Reputation: 5255
According to the suggestions in the comments I tried to avoid dead code elimination by returning all the result values as a List
, e.g.
[Benchmark]
public List<double> WithBranches_fixedData()
{
var results = new List<double>(N);
foreach (var number in fixedData)
{
results.Add(WrapPositiveWithBranches(PeriodicDistance, number));
}
return results;
}
This indeed causes a noticable longer runtime of the benchmark, indicating that code which was previously optimized away is now executed.
The branch prediction penalty is now visible in the results, when comparing randomData
to fixedData
: 11.290 s > 7.023 s and 4.443 s > 1.001 s
Interstingly, there seems to be an even bigger difference between WithBranches
and WithoutBranches
.
Method | Mean | Error | StdDev |
---|---|---|---|
WithoutBranches_randomData | 11.290 s | 0.1773 s | 0.1659 s |
WithBranches_randomData | 4.443 s | 0.0772 s | 0.0722 s |
WithoutBranches_fixedData | 7.023 s | 0.0766 s | 0.0679 s |
WithBranches_fixedData | 1.001 s | 0.0106 s | 0.0089 s |
Upvotes: 2