Reputation: 3004
I have rewritten a computation library to improve memory management and have discovered that this has resulted in a speed increase. In the original it uses an array whose members are 12 doubles (so 96 bytes) apart in memory, whereas my array is contiguous.
How much of a speed increase would this difference provide?
Upvotes: 1
Views: 1558
Reputation: 9021
I have created a small test program that calculates the array element access times for 1D and 2D arrays. It is a Console
application in C# .NET built in Release
mode (with optimizations enabled). If the size of the 1D array is m
then the size of the 2D array is m x m
.
public class PhysicsCalculator
{
public const Int32 ArrayDimension = 1000;
public long CalculateSingleDimensionPerformance()
{
var arr = new double[ArrayDimension];
var stopwatch = new Stopwatch();
stopwatch.Start();
for (Int32 i = 0; i < ArrayDimension; i++)
{
arr[i] = i;
}
stopwatch.Stop();
return stopwatch.ElapsedTicks;
}
public long CalculateDoubleDimensionPerformance()
{
var arr = new double[ArrayDimension, ArrayDimension];
var stopwatch = new Stopwatch();
stopwatch.Start();
for (Int32 i = 0; i < ArrayDimension; i++)
{
arr[i, 5] = i;
}
stopwatch.Stop();
return stopwatch.ElapsedTicks;
}
}
class Program
{
static void Main(string[] args)
{
var physicsCalculator = new PhysicsCalculator();
// This is a dummy call to tell the runtime to jit the methods before hand (to avoid jitting on first call)
physicsCalculator.CalculateSingleDimensionPerformance();
physicsCalculator.CalculateDoubleDimensionPerformance();
Console.WriteLine("Number of ticks per seconds = " + new TimeSpan(0, 0, 1).Ticks);
Console.WriteLine();
const int numberOfRepetetions = 1000;
long elapsedTicks = 0;
for (var i = 0; i < numberOfRepetetions; i++)
{
elapsedTicks += physicsCalculator.CalculateSingleDimensionPerformance();
}
Console.WriteLine("1D array : ");
GenerateReport(elapsedTicks, numberOfRepetetions);
elapsedTicks = 0;
for (var i = 0; i < numberOfRepetetions; i++)
{
elapsedTicks += physicsCalculator.CalculateDoubleDimensionPerformance();
}
Console.WriteLine("2D array : ");
GenerateReport(elapsedTicks, numberOfRepetetions);
// Wait before exit
Console.Read();
}
private static void GenerateReport(long elapsedTicks, int numberOfRepetetions)
{
var oneSecond = new TimeSpan(0, 0, 1);
Console.WriteLine("Array size = " + PhysicsCalculator.ArrayDimension);
Console.WriteLine("Ticks (avg) = " + elapsedTicks / numberOfRepetetions);
Console.WriteLine("Ticks (for {0} repetitions) = {1}", numberOfRepetetions, elapsedTicks);
Console.WriteLine("Time taken (avg) = {0} ms", (elapsedTicks * oneSecond.TotalMilliseconds) / (numberOfRepetetions * oneSecond.Ticks));
Console.WriteLine("Time taken (for {0} repetitions) = {1} ms", numberOfRepetetions,
(elapsedTicks * oneSecond.TotalMilliseconds) / oneSecond.Ticks);
Console.WriteLine();
}
}
The results on my machine (2.8 GHz Phenom II quad core, 8 GB DDR2 800 MHz RAM, Windows 7 Ultimate x64) are
Number of ticks per seconds = 10000000
1D array : Array size = 1000
Ticks (avg) = 52
Ticks (for 1000 repetitions) = 52598
Time taken (avg) = 0.0052598 ms
Time taken (for 1000 repetitions) = 5.2598 ms
2D array : Array size = 1000
Ticks (avg) = 13829
Ticks (for 1000 repetitions) = 13829984
Time taken (avg) = 1.3829984 ms
Time taken (for 1000 repetitions) = 1382.9984 ms
Interestingly, the results are quite clear, the access times for 2D array elements is significantly greater than those for 1D array elements.
Determining whether time taken is a function of array size
100
Number of ticks per seconds = 10000000 1D array : Array size = 100 Ticks (avg) = 20 Ticks (for 1000 repetitions) = 20552 Time taken (avg) = 0.0020552 ms Time taken (for 1000 repetitions) = 2.0552 ms 2D array : Array size = 100 Ticks (avg) = 326 Ticks (for 1000 repetitions) = 326039 Time taken (avg) = 0.0326039 ms Time taken (for 1000 repetitions) = 32.6039 ms
20
Number of ticks per seconds = 10000000 1D array : Array size = 20 Ticks (avg) = 16 Ticks (for 1000 repetitions) = 16653 Time taken (avg) = 0.0016653 ms Time taken (for 1000 repetitions) = 1.6653 ms 2D array : Array size = 20 Ticks (avg) = 21 Ticks (for 1000 repetitions) = 21147 Time taken (avg) = 0.0021147 ms Time taken (for 1000 repetitions) = 2.1147 ms
12
(your use case)Number of ticks per seconds = 10000000 1D array : Array size = 12 Ticks (avg) = 16 Ticks (for 1000 repetitions) = 16548 Time taken (avg) = 0.0016548 ms Time taken (for 1000 repetitions) = 1.6548 ms 2D array : Array size = 12 Ticks (avg) = 20 Ticks (for 1000 repetitions) = 20762 Time taken (avg) = 0.0020762 ms Time taken (for 1000 repetitions) = 2.0762 ms
As you can see, the array size does make a difference in the element access time. But, in your use case of array size as 12
, the difference is about (0.0016548 ms
for 1D vs 0.0020762 ms
for 2D) 25
% i.e. 1D access is 25% faster than 2D access.
When the lower array size is smaller in case of 2D array
In the above samples, if the size of the 1D array is m
then the size of the 2D array is m x m
. When the size of the 2D array is reduced to m x 2
, I get the following results for m = 12
1D array : Array size = 12 Ticks (avg) = 16 Ticks (for 1000 repetitions) = 16100 Time taken (avg) = 0.00161 ms Time taken (for 1000 repetitions) = 1.61 ms 2D array : Array size = 12 x 2 Ticks (avg) = 16 Ticks (for 1000 repetitions) = 16324 Time taken (avg) = 0.0016324 ms Time taken (for 1000 repetitions) = 1.6324 ms
In this case, the difference is hardly 1.3%.
To gauge the performance in your system, I would suggest that you convert the above code in FORTRAN and run the benchmarks with actual values.
Upvotes: 1