AMissico
AMissico

Reputation: 21684

What is the fastest way to Initialize a multi-dimensional array to non-default values in .NET?

How do I initialize a multi-dimensional array of a primitive type as fast as possible?

I am stuck with using multi-dimensional arrays. My problem is performance. The following routine initializes a 100x100 array in approx. 500 ticks. Removing the int.MaxValue initialization results in approx. 180 ticks just for the looping. Approximately 100 ticks to create the array without looping and without initializing to int.MaxValue.

I am open to suggestions on how to optimize this non-default initialization of an array. One idea I had is to use a smaller primitive type when available. For instance, using byte instead of int, saves 100 ticks. I would be happy with this, but I am hoping that I don't have to change the primitive data type.

    public int[,] CreateArray(Size size) {
        int[,] array = new int[size.Width, size.Height];
        for (int x = 0; x < size.Width; x++) {
            for (int y = 0; y < size.Height; y++) {
                array[x, y] = int.MaxValue;
            }
        }
        return array;
    }

Down to 450 ticks with the following:

    public int[,] CreateArray1(Size size) {
        int iX = size.Width;
        int iY = size.Height;
        int[,] array = new int[iX, iY];
        for (int x = 0; x < iX; x++) {
            for (int y = 0; y < iY; y++) {
                array[x, y] = int.MaxValue;
            }
        }
        return array;
    }

CreateArray5; Accepted Implementation: Limitation: Unable to Resize, can be changed

Down to approximately 165 ticks after a one-time initialization of 2800 ticks. (See my answer below.) If I can get stackalloc to work with multi-dimensional arrays, I should be able to get the same performance without having to intialize the private static array.

    private static bool _arrayInitialized5;
    private static int[,] _array5;

    public static int[,] CreateArray5(Size size) {
        if (!_arrayInitialized5) {
            int iX = size.Width;
            int iY = size.Height;
            _array5 = new int[iX, iY];
            for (int x = 0; x < iX; x++) {
                for (int y = 0; y < iY; y++) {
                    _array5[x, y] = int.MaxValue;
                }
            }
            _arrayInitialized5 = true;
        }
        return (int[,])_array5.Clone();
    }

CreateArray8; Accepted Implemention; Limitation: Requires Full Trust

Down to approximately 165 ticks without using the "clone technique" above. (See my answer below.) I am sure I can get the ticks lower, if I can just figure out the return of CreateArray9.

    public unsafe static int[,] CreateArray8(Size size) {
        int iX = size.Width;
        int iY = size.Height;
        int[,] array = new int[iX, iY];
        fixed (int* pfixed = array) {
            int count = array.Length;
            for (int* p = pfixed; count-- > 0; p++)
                *p = int.MaxValue;
        }
        return array;
    }

Notes

I am providing all code and notes regarding this question as answers. Hopefully, it will save someone's time in the future.

Arrays allocated on the Large Object Heap (LOH) are not part of this discussion. The performance improvements noted hear are only for arrays allocated on the heap.

Stackalloc

My idea of using stackalloc to eliminate initializing array to default values did not work out because the allocated stack memory must be copied out of the method. Meaning, I would have to create another array to hold the results. This array would be initialized defeating the whole purpose of using stackalloc.

CreateArray8; unsafe/fixed method

The CLR will only execute unsafe code if it is in a fully trusted assembly.

CreateArray5; clone method

Requires variables to determine if array is initialized and to store the initialized array. Performance is the same as the unsafe/fixed method after initialization. See Dan Tao's answer for possible solution.

300% Performance Increase?

I suck at percentages, but 300% is what I figured (500 to 165 ticks).


Final Implementation for Application

For this application, I settled on using the "clone" method. Following is a "lean" Generic implementation used in the application with performance samples.

Initialization:

Use:

Upvotes: 5

Views: 4651

Answers (7)

AMissico
AMissico

Reputation: 21684

Class and Generic using "Clone" method.

  • MDArray; clone class initalize: 2444, 2587, 2421, 2406
  • MDArray; clone class: 440, 362, 198, 139
  • Grid<int>; generic clone class initalize: 5344, 5334, 5693, 5272
  • Grid<int>; generic clone class: 187, 204, 199, 288
  • Grid<bool>; generic clone class initalize: 3585, 3537, 3552, 3569
  • Grid<bool>; generic clone class: 37, 44, 36, 43
  • Grid<Color>; generic clone class initalize: 4139, 3536, 3503, 3533
  • Grid<Color>; generic clone class: 2737, 3137, 2414, 2171

[TestClass]
public class CreateArrayTest {

    public class MDArray {
        private bool _initialized;
        private int[,] _array;
        private int _x;
        private int _y;
        private int _value;
        public MDArray(Size size, int value, bool initialize) {
            _x = size.Width;
            _y = size.Height;
            _value = value;
            if (initialize) {
                InitializeArray();
            }
        }
        private void InitializeArray() {
            int iX = _x;
            int iY = _y;
            _array = new int[iX, iY];
            for (int y = 0; y < iY; y++) {
                for (int x = 0; x < iX; x++) {
                    _array[x, y] = _value;
                }
            }
            _initialized = true;
        }
        public int[,] CreateArray() {
            if (!_initialized) {
                InitializeArray();
            }
            return (int[,])_array.Clone();
        }
    }

    public class Grid<T> {
        private T[,] _array;
        private T _value;
        private bool _initialized;
        private int _x;
        private int _y;
        public Grid(Size size, T value, bool initialize) {
            _x = size.Width;
            _y = size.Height;
            _value = value;
            if (initialize) {
                InitializeArray();
            }
        }
        private void InitializeArray() {
            int iX = _x;
            int iY = _y;
            _array = new T[iX, iY];
            for (int y = 0; y < iY; y++) {
                for (int x = 0; x < iX; x++) {
                    _array[x, y] = _value;
                }
            }
            _initialized = true;
        }
        public T[,] CreateArray() {
            if (!_initialized) {
                InitializeArray();
            }
            return (T[,])_array.Clone();
        }
    }

    [TestMethod()]
    public void CreateArray_Test() {

        int[,] actual;

        int iHi = 10000 * 10 * 2; //'absolute minimum times array will be created   (200,000)
        //                          'could be as high as 10000 * 10 * 20? * 50? (100,000,000?)

        Stopwatch o;

        o = Stopwatch.StartNew();
        MDArray oMDArray = new MDArray(new Size(100, 100), int.MaxValue, true);
        o.Stop();
        Trace.WriteLine(o.ElapsedTicks, "     MDArray; clone class initalize");
        o = Stopwatch.StartNew();
        for (int i = 0; i < iHi; i++) { actual = oMDArray.CreateArray(); }
        o.Stop();
        Trace.WriteLine(o.ElapsedTicks / iHi, "     MDArray; clone class");

        o = Stopwatch.StartNew();
        Grid<int> oIntMap = new Grid<int>(new Size(100, 100), int.MaxValue, true);
        o.Stop();
        Trace.WriteLine(o.ElapsedTicks, "   Grid<int>; generic clone class initalize");
        o = Stopwatch.StartNew();
        for (int i = 0; i < iHi; i++) { actual = oIntMap.CreateArray(); }
        o.Stop();
        Trace.WriteLine(o.ElapsedTicks / iHi, "   Grid<int>; generic clone class");

        bool[,] fActual;
        o = Stopwatch.StartNew();
        Grid<bool> oBolMap = new Grid<bool>(new Size(100, 100), true, true);
        o.Stop();
        Trace.WriteLine(o.ElapsedTicks, "  Grid<bool>; generic clone class initalize");
        o = Stopwatch.StartNew();
        for (int i = 0; i < iHi; i++) { fActual = oBolMap.CreateArray(); }
        o.Stop();
        Trace.WriteLine(o.ElapsedTicks / iHi, "  Grid<bool>; generic clone class");

        Color[,] oActual;
        o = Stopwatch.StartNew();
        Grid<Color> oColMap = new Grid<Color>(new Size(100, 100), Color.AliceBlue, true);
        o.Stop();
        Trace.WriteLine(o.ElapsedTicks, " Grid<Color>; generic clone class initalize");
        o = Stopwatch.StartNew();
        for (int i = 0; i < iHi; i++) { oActual = oColMap.CreateArray(); }
        o.Stop();
        Trace.WriteLine(o.ElapsedTicks / iHi, " Grid<Color>; generic clone class");
    }
}

Upvotes: 0

Dan Tao
Dan Tao

Reputation: 128327

A note on your Clone approach: I doubt you will be able to beat it, in terms of performance. However, it could be a breaking change considering that after the first initialization, it disregards the Size parameter and just returns an array of the same size on every call. Depending on whether that actually matters in your scenario, you could either:

  1. Stick with it, because it doesn't matter.
  2. Create a Dictionary<Size, int[,]> (I believe Size would behave properly as a key -- haven't tested) to pre-initialize an array every time a unique Size is requested. The overhead of this I am not sure of.
  3. Abandon the Clone idea.

In case you end up having to go with 3 above, here are a few borderline ridiculous suggestions:

1. Cache your Width and Height properties locally, rather than accessing them from the Size struct on each iteration.

static int[,] CreateArray(Size size) {
    int w = size.Width;
    int h = size.Height;

    int[,] array = new int[w, h];
    for (int x = 0; x < w; x++) {
        for (int y = 0; y < h; y++) {
            array[x, y] = int.MaxValue;
        }
    }

    return array;
}

To create a 1000x1000 array, on my machine, this resulted in an average execution time of about 120000 ticks versus about 140000 ticks.

2. Leverage multiple cores if you have them and initialize the array in parallel.

static int[,] CreateArray(Size size) {
    int w = size.Width;
    int h = size.Height;

    int[,] array = new int[w, h];
    Action<int[,], int, int> fillFirstHalf = FillArray;
    Action<int[,], int, int> fillSecondHalf = FillArray;

    var firstResult = fillFirstHalf.BeginInvoke(array, 0, h / 2, null, null);
    var secondResult = fillSecondHalf.BeginInvoke(array, h / 2, h, null, null);

    fillFirstHalf.EndInvoke(firstResult);
    fillSecondHalf.EndInvoke(secondResult);

    return array;
}

static void FillArray(int[,] array, int ystart, int yend) {
    int w = array.GetLength(0);

    for (int x = 0; x < w; ++x) {
        for (int y = ystart; y < yend; ++y) {
            array[x, y] = int.MaxValue;
        }
    }
}

This one probably isn't a very realistic suggestion in your scenario, since it seems that you're only creating 100x100 arrays, in which case the overhead of the parallelization exceeds the performance gain. However, for creating a 1000x1000 array, I found that this approach reduced my execution times down to about 70k ticks on average (compared to the ~120k ticks I got from the first optimization I suggested).

Also, if you are creating many arrays this way, I would highly recommend parallelizing that (i.e., if you need to create a thousand arrays, create 500 each from two threads), assuming you have multiple processors to do the work for you. Without multiple processors, forget it; adding threads will only hurt your performance.

3. Get enhanced performance by using an unsafe pointer.

Now here's an interesting discovery: it appears that a two-dimensional array in .NET is allocated in a predictable way*: basically as a one-dimensional block of memory, where each "row" is offset from the starting point by an amount equivalent to the length of all previous rows. In other words, a 10x2 array can be accessed using pointer just like a 20x1 array; a 10x10 array can be accessed like a 100x1 array, etc.

I have no idea if this is documented behavior or not. It may be an unspecified implementation detail that you don't want to depend on. Either way, it's worth looking into.

* It's possible that most other .NET developers already knew this and I'm just stating the obvious, in which case, I rescind my comment about this being "interesting".

In any case, knowledge of this allows you to exploit the fixed keyword in an unsafe context for a significant performance gain:

static int[,] CreateArray(Size size) {
    int w = size.Width;
    int h = size.Height;

    int[,] array = new int[w, h];
    unsafe {
        fixed (int* ptr = array) {
            for (int i = 0; i < w * h; ++i)
                ptr[i] = int.MaxValue;
        }
    }

    return array;
}

For initializing arrays of a signifcant size, I would even recommend combining the above approach (parallelization) with this one -- so, keep the same CreateArray from suggestion #2, and then rewrite FillArray as:

static void FillArray(int[,] array, int ystart, int yend) {
    int w = array.GetLength(0);

    unsafe {
        fixed (int* p = array) {
            for (int i = w * ystart; i < w * yend; ++i)
                p[i] = int.MaxValue;
        }
    } 
}

It actually seems that you already figured out this last part before I posted this, but I thought I'd include it anyway mainly for the point about combining unsafe with parallelization.


A note on stackalloc: I think you may be chasing after the leprechaun at the end of the rainbow with this one. According to the documentation on stackalloc:

A block of memory of sufficient size to contain expr elements of type type is allocated on the stack, not the heap; the address of the block is stored in pointer ptr. This memory is not subject to garbage collection and therefore does not have to be pinned (via fixed). The lifetime of the memory block is limited to the lifetime of the method in which it is defined. (emphasis mine)

This leads me to believe that you cannot return an object whose data is stored in memory allocated by stackalloc from a function, because that memory is only allocated for the lifetime of the function.

Upvotes: 4

AMissico
AMissico

Reputation: 21684

I was able to get the ticks down to approximately 165. See CreateArray8 below.

I got an idea from the "Range Check Elimination" section of the CodeProject article at http://www.codeproject.com/KB/dotnet/arrays.aspx provided by jdk. (@jdk, thank you very much.) The idea was to eliminate the range checking by using a pointer and initializing each element in one loop. I was able to get the ticks down to approximately 165. Just as good as cloning without the delay of pre-initialization and the supporting static variables. (See my other answer.)

I bet I can cut this in half, if I can just figure out the return of CreateArray9.

  • CreateArray3; static unsafe: 501, 462, 464, 462
  • CreateArray7; static unsafe for(y,x): 452, 451, 444, 429
  • CreateArray8; static unsafe pointer single_for: 187, 173, 156, 145

[TestClass]
public class CreateArrayTest {

    public static unsafe int[,] CreateArray3(Size size) {
        int iX = size.Width;
        int iY = size.Height;
        int[,] array = new int[iX, iY];
        for (int x = 0; x < iX; x++) {
            for (int y = 0; y < iY; y++) {
                array[x, y] = int.MaxValue;
            }
        }
        return array;
    }

    public unsafe static int[,] CreateArray7(Size size) {
        int iX = size.Width;
        int iY = size.Height;
        int[,] array = new int[iX, iY];
        for (int y = 0; y < iY; y++) {
            for (int x = 0; x < iX; x++) {
                array[x, y] = int.MaxValue;
            }
        }
        return array;
    }

    public unsafe static int[,] CreateArray8(Size size) {
        int iX = size.Width;
        int iY = size.Height;
        int[,] array = new int[iX, iY];
        fixed (int* pfixed = array) {
            int count = array.Length;
            for (int* p = pfixed; count-- > 0; p++)
                *p = int.MaxValue;
        }
        return array;
    }

    public unsafe static int[,] CreateArray9(Size size) {
        int iX = size.Width;
        int iY = size.Height;
        void* array = stackalloc int[iX * iY];
        int count = iX * iY;
        for (int* p = (int*)array; count-- > 0; p++)
            *p = int.MaxValue;

        //return (int[,])array; //how to return?
        return new int[1, 1];
    }

    [TestMethod()]
    public void CreateArray_Test() {

        int[,] actual;

        int iHi = 10000 * 10 * 2;
        //'absolute minimum times array will be created   (200,000)
        //'could be as high as 10000 * 10 * 20? * 50? (100,000,000?)

        Stopwatch o;

        o = Stopwatch.StartNew();
        for (int i = 0; i < iHi; i++) { actual = CreateArray3(new Size(100, 100)); }
        o.Stop();
        Trace.WriteLine(o.ElapsedTicks / iHi, "CreateArray3; static unsafe");

        o = Stopwatch.StartNew();
        for (int i = 0; i < iHi; i++) { actual = CreateArray7(new Size(100, 100)); }
        o.Stop();
        Trace.WriteLine(o.ElapsedTicks / iHi, "CreateArray7; static unsafe for(y,x)");

        o = Stopwatch.StartNew();
        for (int i = 0; i < iHi; i++) { actual = CreateArray8(new Size(100, 100)); }
        o.Stop();
        Trace.WriteLine(o.ElapsedTicks / iHi, "CreateArray8; static unsafe pointer single_for");

    }

}

Upvotes: 0

Erich Mirabal
Erich Mirabal

Reputation: 10038

This is untested in this scenario, but has worked in similar ones. Sometimes, swapping the order of the array traversal speeds things up due to memory locality.

In other words, instead of doing for(x) ... for(y) do instead for(y) ... for(x).

Upvotes: 1

AMissico
AMissico

Reputation: 21684

Adding static and unsafe provide some reduction in Ticks. Following are some samples.

  • CreateArray; non-static non-unsafe: 521, 464, 453, 474
  • CreateArray; static: 430, 423, 418, 454
  • CreateArray; unsafe: 485, 464, 435, 414
  • CreateArray; static unsafe: 476, 450, 433, 405

I tried to use stackalloc. My idea was to allocate the array, which would not be initialized because it is unsafe code. I would then zip down the array, initalizing to int.MaxValue as I go, then Clone the array for the return result. But, I got stumped on the multi-dimensional declaration.

Then I remembered using Clone on arrays in another project. Each Array.Clone saved several seconds. Based on this idea, I created the following version of the CreateArray routine getting excellent results.

Now, all I need to do is get stackalloc to work with multi-dimensional arrays.

  • CreateArray5; pre-initialize: 2663, 3036
  • CreateArray5; clone: 157, 172

    private static bool _arrayInitialized5;
    private static int[,] _array5;
    
    public static int[,] CreateArray5(Size size) {
        if (!_arrayInitialized5) {
            int iX = size.Width;
            int iY = size.Height;
            _array5 = new int[iX, iY];
            for (int x = 0; x < iX; x++) {
                for (int y = 0; y < iY; y++) {
                    _array5[x, y] = int.MaxValue;
                }
            }
            _arrayInitialized5 = true;
        }
        return (int[,])_array5.Clone();
    }
    

    int[,] actual;

    int iHi = 10000 * 10 * 2; 
    //'absolute minimum times array will be created   (200,000)
    //'could be as high as 10000 * 10 * 20? * 50? (100,000,000?)

    Stopwatch o;

    //'pre-initialize
    o = Stopwatch.StartNew();
    actual = CreateArray5(new Size(100, 100));
    o.Stop();
    Trace.WriteLine(o.ElapsedTicks, "CreateArray5; pre-initialize");
    o = Stopwatch.StartNew();
    for (int i = 0; i < iHi; i++) { actual = CreateArray5(new Size(100, 100)); }
    o.Stop();
    Trace.WriteLine(o.ElapsedTicks / iHi, "CreateArray5; static unsafe clone");

Upvotes: 2

Akash Kava
Akash Kava

Reputation: 39916

You can use each row in parallel by using parallel library to initialize it, it will be faster.

However, I think there is limitation to this For, only 64 operations can be queued, but in that case you can queue 0 to 63, 64 to 127 etc.. in Parallel.For ..

public int[,] CreateArray(Size size) { 
    int[,] array = new int[size.Width, size.Height]; 
    System.Threading.Paralle.For (0,size.Width, 
      x=>{ 
        for (int y = 0; y < size.Height; y++) { 
            array[x, y] = int.MaxValue; 
        } 
      }
    ); 
    return array; 
} 

This is included in .NEt 4, however for .NET 3.51 you can get same library from codeplex.

Upvotes: 0

John K
John K

Reputation: 28869

Unmanaged Array for Granular Control

Create the array in C# unmanaged (unsafe) mode like shown here[code project] and initialize it manually.

In C# managed mode the array first initializes all elements to the default value before your loop starts to populate it. You can likely cut out the doubling up in unmanaged mode. This would save a lot of time.

Upvotes: 3

Related Questions