null
null

Reputation: 5255

Why is this benchmark not measuring any branch prediction penalty?

I came across the question Wrapping real numbers which asks for help on implementing a "wrap around" functionality for double values within a given range, stating the following constraint:

Note: I must not use if-else or ?: (conditional operator) just for performance reasons. When I am working with thousands of particles, checking for each coordinate for each axis reduces performance considerably.

The assumption seems to be that branch prediction would incur a significant penalty on performance. Andrew Morton asks

Have you tried it with an if? It could be that the CPU branch predictor gets it right often enough that the code isn't noticeably slowed down.

Well, I did:

Here are the results

Method Mean Error StdDev
WithoutBranches_randomData 85.68 ms 0.750 ms 0.922 ms
WithBranches_randomData 84.36 ms 0.427 ms 0.474 ms
WithoutBranches_fixedData 84.83 ms 0.954 ms 1.620 ms
WithBranches_fixedData 84.80 ms 0.965 ms 0.947 ms

To me it looks like there´s no significant difference in performance between the four benchmarked scenarios.

Why do the benchmark results not show a branch prediction penalty?

I do not write benchmarks very often and I know that it´s pretty easy to make mistakes. Is there anything wrong with the benchmark code that I wrote?

using System;
using System.Linq;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;

namespace WrapAroundBenchmark;

public class Program
{
    static void Main(string[] args)
    {
        BenchmarkRunner.Run<WrapAround>();
    }

    public class WrapAround
    {
        private const double PeriodicDistance = 10d;
        private const double Within = 4d;
        private const double Below = -1d;
        private const double Above = 15d;

        private const int N = 30_0000_000;
        private readonly double[] randomData;
        private readonly double[] fixedData;

        public WrapAround()
        {
            var random = new Random(42);
            randomData = Enumerable.Empty<double>().Concat(Enumerable.Repeat(Below, N / 3))
                                                   .Concat(Enumerable.Repeat(Within, N / 3))
                                                   .Concat(Enumerable.Repeat(Above, N / 3))
                                                   .ToArray();
            random.Shuffle(randomData);

            fixedData = Enumerable.Repeat(Below, N).ToArray();
        }

        [Benchmark]
        public void WithoutBranches_randomData()
        {
            foreach (var number in randomData)
            {
                WrapPositiveWithoutBranches(PeriodicDistance, number);
            }
        }

        [Benchmark]
        public void WithBranches_randomData()
        {
            foreach (var number in randomData)
            {
                WrapPositiveWithBranches(PeriodicDistance, number);
            }
        }

        [Benchmark]
        public void WithoutBranches_fixedData()
        {
            foreach (var number in fixedData)
            {
                WrapPositiveWithoutBranches(PeriodicDistance, number);
            }
        }

        [Benchmark]
        public void WithBranches_fixedData()
        {
            foreach (var number in fixedData)
            {
                WrapPositiveWithBranches(PeriodicDistance, number);
            }
        }

        private static double WrapPositiveWithoutBranches(double periodicDistance, double position)
        {
            return (position % periodicDistance + periodicDistance) % periodicDistance;
        }

        private static double WrapPositiveWithBranches(double periodicDistance, double position)
        {
            if (position < 0) return 0;
            if (position >= periodicDistance) return position % periodicDistance;
            return position;
        }
    }
}

Upvotes: 0

Views: 78

Answers (1)

null
null

Reputation: 5255

According to the suggestions in the comments I tried to avoid dead code elimination by returning all the result values as a List, e.g.

        [Benchmark]
        public List<double> WithBranches_fixedData()
        {
            var results = new List<double>(N);
            foreach (var number in fixedData)
            {
                results.Add(WrapPositiveWithBranches(PeriodicDistance, number));
            }
            return results;
        }

This indeed causes a noticable longer runtime of the benchmark, indicating that code which was previously optimized away is now executed. The branch prediction penalty is now visible in the results, when comparing randomData to fixedData: 11.290 s > 7.023 s and 4.443 s > 1.001 s Interstingly, there seems to be an even bigger difference between WithBranches and WithoutBranches.

Method Mean Error StdDev
WithoutBranches_randomData 11.290 s 0.1773 s 0.1659 s
WithBranches_randomData 4.443 s 0.0772 s 0.0722 s
WithoutBranches_fixedData 7.023 s 0.0766 s 0.0679 s
WithBranches_fixedData 1.001 s 0.0106 s 0.0089 s

Upvotes: 2

Related Questions