Alois Kraus
Alois Kraus

Reputation: 13535

Why are performance measurements differing?

I have a simple method which converts an array from one type to another. I wanted to find out which method is the fastest. But so far I get differing results from which I cannot conclude which method is really faster by which margin.

Since the conversion is only about allocating memory, reading the array and converting values I am surprised that the values are not more stable. I wanted to know how I can make accurate measurements which are meaningful and to not change from one day to the other. The differences are about 20% from one day to the other.

There are of course differences between the JITer of .NET 3.5 and 4.0, debug and release mode, not running the executable under a debugger (disables JIT optimizations until you disable it), code generation of the C# compiler between DEBUG and RELEASE (mainly nop operations and more temporary variables in the IL code).

using System;
using System.Collections.Generic;
using System.Diagnostics;

namespace PerfTest
{
    class Program
    {
        const int RUNS = 10 * 1000 * 1000;


        static void Main(string[] args)
        {
            int[] array = new int[] { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43 };

            var s2 = Stopwatch.StartNew();
            for (int i = 0; i < RUNS; i++)
            {
                float[] arr = Cast(array);
            }
            s2.Stop();
            GC.Collect();

            var s3 = Stopwatch.StartNew();
            for (int i = 0; i < RUNS; i++)
            {
                float[] arr = Cast2(array);
            }
            s3.Stop();
            GC.Collect();

            var s4 = Stopwatch.StartNew();
            for (int i = 0; i < RUNS; i++)
            {
                var arr = CastSafe(array);
            }
            s4.Stop();


            Console.WriteLine("Times: {0} {1} {2}", s2.ElapsedMilliseconds, s3.ElapsedMilliseconds, s4.ElapsedMilliseconds);
        }

        // Referece cast implementation to check performance
        public static unsafe float[] Cast(int[] input)
        {
            int N = input.Length;
            float[] output = new float[N];

            fixed (int* pIStart = &input[0])
            {
                int* pI = pIStart;
                fixed (float* pOStart = &output[0])
                {
                    float* pO = pOStart;

                    for (int i = 0; i < N; i++)
                    {
                        *pO = (float)*pI;
                        pI++;
                        pO++;
                    }
                }
            }

            return output;
        }

        // Referece cast implementation to check performance
        public static unsafe float[] Cast2(int[] input)
        {
            int N = input.Length;
            float[] output = new float[N];
            fixed (int* pIStart = &input[0])
            {
                int* pI = pIStart;
                fixed (float* pOStart = &output[0])
                {
                    float* pO = pOStart;

                    for (int i = 0; i < N; i++)
                    {
                        pO[i] = (float) pI[i];
                    }
                }
            }

            return output;
        }
        public static float[] CastSafe(int[] input)
        {
            int N = input.Length;
            float[] output = new float[N];

            for (int i = 0; i < input.Length; i++)
            {
                output[i] = (float)input[i];
            }

            return output;
        }
    }
}

I do get then

From this it does look like the dumb safe variant is faster than any unsafe variant although bounds check elimination of the unsafe methods should make it at least as fast if not faster. Just for fun I did also compile the same IL code via LCG (DynamicMethod) which seem to be even slower than any of these methods although the additional cost of the delegate invocation does not seem to play such a big role here.

The for loop does execute this code 10 million times which should produce stable results. Why am I seeing any differences here at all? Using realtime as process priority did also not help (psexec -realtime executable). How can I get reliable numbers?

My tests did include

If I use a profiler I am not sure if he will distort the measurements even more. Since he does interrupt my application from time to time to get the call stacks he will certainly destroy any cache locality which might aid performance. If there is any approach with better (data) cache locality I will not be able to find it out with a profiler.

Edit1: To take into account that I do not have a real time OS I do now sample my measurements. Since for one thread I have a 15ms time window granted the the Windows Scheduler I can keep out the Scheduler if I measure shorter than 15ms. If I do measure too shortly I will end up with very small tick counts which will not tell me much.

To get stable values I need a time span long enough to let the OS do whatever it does on a regular basis. Empiric tests have shown that 30+ seconds is a good time span one measurment should take.

This time span is then divided into sample time spans which are well below 15ms. Then I will get timing information for each sample. From the samples I can extract min/max and average. This way I can also see first time initialization effects. The code looks now like this

class Program
{
    const int RUNS = 100 * 1000 * 1000; // 100 million runs will take about 30s
    const int RunsPerSample = 100;      // 100 runs for on sample is about 0,01ms << 15ms

    static void Main(string[] args)
    {
        int[] array = new int[] { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43 };
        long[] sampleTimes = new long [RUNS/RunsPerSample];

        int sample = 0;
        for (int i = 0; i < RUNS; i+=RunsPerSample)
        {
            var sw = Stopwatch.StartNew();
            for (int j = i; j < i+RunsPerSample; j++)
            {
                float[] arr = Cast(array);
            }
            sw.Stop();
            sampleTimes[sample] = sw.ElapsedTicks;
            sample++;
        }
        Console.WriteLine("SampleSize: {0}, Min {1}, Max {2}, Average {3}",
            RunsPerSample, sampleTimes.Min(), sampleTimes.Max(), sampleTimes.Average());

The values from these tests do still vary (<10%) but I think that if I create a histogram chart of my values and drop the 10% highest values which are likely caused by the OS, GC, ... I can get really stable numbers which I can trust.

SampleSize: 100, Min 25, Max 86400, Average 28,614631

Edit2: The histograms show that the measured values are not random. They look like a Landau distribution which should give me with the right approximation algorithms stable values. I wish in .NET would exist something like ROOT where I can interactivly fit the right distribution function to my data and get the results.

Measured Values Histogram

The code to generate the histogram with the MSChart controls is below:

using System.Collections.Generic;
using System.Drawing;
using System.Linq;
using System.Windows.Forms;
using System.Windows.Forms.DataVisualization.Charting;

namespace ConsoleApplication4
{
    public partial class Histogram : Form
    {
        public Histogram(long [] sampleTimes)
        {
            InitializeComponent();

            Series  histogramSeries = cHistogram.Series.Add("Histogram");

            // Set new series chart type and other attributes
            histogramSeries.ChartType = SeriesChartType.Column;
            histogramSeries.BorderColor = Color.Black;
            histogramSeries.BorderWidth = 1;
            histogramSeries.BorderDashStyle = ChartDashStyle.Solid;

            var filtered = RemoveHighValues(sampleTimes, 40);
            KeyValuePair<long,int>[] histoData = GenerateHistogram(filtered);

            ChartArea chartArea = cHistogram.ChartAreas[histogramSeries.ChartArea];
            chartArea.AxisY.Title = "Frequency";

            chartArea.AxisX.Minimum = histoData.Min( x=>x.Key );
            chartArea.AxisX.Maximum = histoData.Max( x=>x.Key );

            foreach (var v in histoData)
            {
                histogramSeries.Points.Add(new DataPoint(v.Key, v.Value));
            }

            chartArea.AxisY.Minimum = 0;
            chartArea.AxisY.Maximum = histoData[0].Value + 100;
        }

        // Count the occurence of each value of input and return an array with the value as key and its count as value
        // as ordered list starting with the highest counts.
        KeyValuePair<long,int>[] GenerateHistogram(long [] input)
        {
            Dictionary<long, int> counts = new Dictionary<long, int>();
            foreach (var value in input)
            {
                int old = 0;
                if (!counts.TryGetValue(value, out old))
                {
                    counts[value] = 0;
                }
                counts[value] = ++old;
            }

            var orderedCounts = (from x in counts
                                 orderby x.Value descending
                                 select x).ToArray();

            return orderedCounts;
        }

        long[] RemoveHighValues(long[] input, int maxDifference)
        {
            var min = input.Min();
            var max = input.Max();

            long[] filtered = input;

            while (max - min > maxDifference) // remove all values wich differ by more than maxDifference ticks
            {
                filtered = input.Where(x => x < max).ToArray();
                max = filtered.Max();
            }

            return filtered;

        }
    }
}

Upvotes: 6

Views: 462

Answers (4)

Alois Kraus
Alois Kraus

Reputation: 13535

I have amended my original question with the finding that the numbers were not random but follow a distribution (looks like a Landau distribution) where I can use fitting algorithms to get the peak value with the most likely true timing.

Upvotes: 1

Gilad
Gilad

Reputation: 2886

Stopwatch isn't that accurate, try using HighResClock

http://netcode.ru/dotnet/?lang=&katID=30&skatID=261&artID=7113

don't expect measurements accurate to the nano-second, as somebody else wrote, Win7 isn't a real-time OS.

Also, after GC.Collect() you might want to put GC.WaitForPendingFinalizers();

Upvotes: 0

LukeH
LukeH

Reputation: 269278

You're talking about an average difference of about a hundredth of a nanosecond per method call. Windows doesn't claim to be a realtime OS; these measurements are about as stable as you'll get.

And, by the way, the jitter will eliminate the bounds check inside your CastSafe method. I'd be very surprised if you can find anything faster than that.

(If the bottleneck is CPU then you might improve the performance by using Parallel.For rather than a plain for loop, but to determine that you'd need to test against real-world data. For example, cache behaviour is going to be vastly different for an array of 43 ints than for an array of 43,000,000 ints.)

Upvotes: 4

user unknown
user unknown

Reputation: 36229

I guess it would run with mono under Linux? To avoid the influence of the multitasking environment, you can start any program with

time program

and get a measurement, how many cpu-time your program used.

You're measuring the warm-up phase and load time too, but if you have enough elements in the loop, it shouldn't be much of a problem. Maybe there is an equivalent program on the windows platform?

Upvotes: 0

Related Questions