Speed of reading from contiguous vs non-contiguous arrays

Question

I have rewritten a computation library to improve memory management and have discovered that this has resulted in a speed increase. In the original it uses an array whose members are 12 doubles (so 96 bytes) apart in memory, whereas my array is contiguous.

How much of a speed increase would this difference provide?

Devendra D. Chavan · Accepted Answer

I have created a small test program that calculates the array element access times for 1D and 2D arrays. It is a Console application in C# .NET built in Release mode (with optimizations enabled). If the size of the 1D array is m then the size of the 2D array is m x m.

public class PhysicsCalculator
{
    public const Int32 ArrayDimension = 1000;

    public long CalculateSingleDimensionPerformance()
    {
        var arr = new double[ArrayDimension];

        var stopwatch = new Stopwatch();
        stopwatch.Start();

        for (Int32 i = 0; i < ArrayDimension; i++)
        {
            arr[i] = i;
        }

        stopwatch.Stop();

        return stopwatch.ElapsedTicks;
    }

    public long CalculateDoubleDimensionPerformance()
    {
        var arr = new double[ArrayDimension, ArrayDimension];

        var stopwatch = new Stopwatch();
        stopwatch.Start();

        for (Int32 i = 0; i < ArrayDimension; i++)
        {
            arr[i, 5] = i;
        }

        stopwatch.Stop();

        return stopwatch.ElapsedTicks;
    }
}

class Program
{
    static void Main(string[] args)
    {
        var physicsCalculator = new PhysicsCalculator();

        // This is a dummy call to tell the runtime to jit the methods before hand (to avoid jitting on first call)
        physicsCalculator.CalculateSingleDimensionPerformance();
        physicsCalculator.CalculateDoubleDimensionPerformance();

        Console.WriteLine("Number of ticks per seconds = " + new TimeSpan(0, 0, 1).Ticks);
        Console.WriteLine();

        const int numberOfRepetetions = 1000;
        long elapsedTicks = 0;

        for (var i = 0; i < numberOfRepetetions; i++)
        {
            elapsedTicks += physicsCalculator.CalculateSingleDimensionPerformance();
        }

        Console.WriteLine("1D array : ");
        GenerateReport(elapsedTicks, numberOfRepetetions);

        elapsedTicks = 0;
        for (var i = 0; i < numberOfRepetetions; i++)
        {
            elapsedTicks += physicsCalculator.CalculateDoubleDimensionPerformance();
        }

        Console.WriteLine("2D array : ");
        GenerateReport(elapsedTicks, numberOfRepetetions);

        // Wait before exit
        Console.Read();
    }

    private static void GenerateReport(long elapsedTicks, int numberOfRepetetions)
    {
        var oneSecond = new TimeSpan(0, 0, 1);

        Console.WriteLine("Array size = " + PhysicsCalculator.ArrayDimension);
        Console.WriteLine("Ticks (avg) = " + elapsedTicks / numberOfRepetetions);
        Console.WriteLine("Ticks (for {0} repetitions) = {1}", numberOfRepetetions, elapsedTicks);
        Console.WriteLine("Time taken (avg) = {0} ms", (elapsedTicks * oneSecond.TotalMilliseconds) / (numberOfRepetetions * oneSecond.Ticks));
        Console.WriteLine("Time taken (for {0} repetitions) = {1} ms", numberOfRepetetions,
           (elapsedTicks * oneSecond.TotalMilliseconds) / oneSecond.Ticks);

        Console.WriteLine();
    }
}

The results on my machine (2.8 GHz Phenom II quad core, 8 GB DDR2 800 MHz RAM, Windows 7 Ultimate x64) are

Number of ticks per seconds = 10000000

1D array : Array size = 1000 
Ticks (avg) = 52 
Ticks (for 1000 repetitions) = 52598 
Time taken (avg) = 0.0052598 ms 
Time taken (for 1000 repetitions) = 5.2598 ms

2D array : Array size = 1000 
Ticks (avg) = 13829
Ticks (for 1000 repetitions) = 13829984 
Time taken (avg) = 1.3829984 ms 
Time taken (for 1000 repetitions) = 1382.9984 ms

Interestingly, the results are quite clear, the access times for 2D array elements is significantly greater than those for 1D array elements.

Determining whether time taken is a function of array size

For array size of 100

Number of ticks per seconds = 10000000

1D array : 
Array size = 100
Ticks (avg) = 20
Ticks (for 1000 repetitions) = 20552
Time taken (avg) = 0.0020552 ms
Time taken (for 1000 repetitions) = 2.0552 ms

2D array : 
Array size = 100
Ticks (avg) = 326
Ticks (for 1000 repetitions) = 326039
Time taken (avg) = 0.0326039 ms
Time taken (for 1000 repetitions) = 32.6039 ms

For array size of 20

Number of ticks per seconds = 10000000

1D array : 
Array size = 20
Ticks (avg) = 16
Ticks (for 1000 repetitions) = 16653
Time taken (avg) = 0.0016653 ms
Time taken (for 1000 repetitions) = 1.6653 ms

2D array : 
Array size = 20
Ticks (avg) = 21
Ticks (for 1000 repetitions) = 21147
Time taken (avg) = 0.0021147 ms
Time taken (for 1000 repetitions) = 2.1147 ms

For array size of 12 (your use case)

Number of ticks per seconds = 10000000

1D array : 
Array size = 12
Ticks (avg) = 16
Ticks (for 1000 repetitions) = 16548
Time taken (avg) = 0.0016548 ms
Time taken (for 1000 repetitions) = 1.6548 ms

2D array : 
Array size = 12
Ticks (avg) = 20
Ticks (for 1000 repetitions) = 20762
Time taken (avg) = 0.0020762 ms
Time taken (for 1000 repetitions) = 2.0762 ms

As you can see, the array size does make a difference in the element access time. But, in your use case of array size as 12, the difference is about (0.0016548 ms for 1D vs 0.0020762 ms for 2D) 25% i.e. 1D access is 25% faster than 2D access.

When the lower array size is smaller in case of 2D array

In the above samples, if the size of the 1D array is m then the size of the 2D array is m x m. When the size of the 2D array is reduced to m x 2, I get the following results for m = 12

1D array : 
Array size = 12
Ticks (avg) = 16
Ticks (for 1000 repetitions) = 16100
Time taken (avg) = 0.00161 ms
Time taken (for 1000 repetitions) = 1.61 ms

2D array : 
Array size = 12 x 2
Ticks (avg) = 16
Ticks (for 1000 repetitions) = 16324
Time taken (avg) = 0.0016324 ms
Time taken (for 1000 repetitions) = 1.6324 ms

In this case, the difference is hardly 1.3%.

To gauge the performance in your system, I would suggest that you convert the above code in FORTRAN and run the benchmarks with actual values.

Speed of reading from contiguous vs non-contiguous arrays

Answers (1)

Related Questions