Reputation: 21684
How do I initialize a multi-dimensional array of a primitive type as fast as possible?
I am stuck with using multi-dimensional arrays. My problem is performance. The following routine initializes a 100x100 array in approx. 500 ticks. Removing the int.MaxValue initialization results in approx. 180 ticks just for the looping. Approximately 100 ticks to create the array without looping and without initializing to int.MaxValue.
I am open to suggestions on how to optimize this non-default initialization of an array. One idea I had is to use a smaller primitive type when available. For instance, using byte instead of int, saves 100 ticks. I would be happy with this, but I am hoping that I don't have to change the primitive data type.
public int[,] CreateArray(Size size) {
int[,] array = new int[size.Width, size.Height];
for (int x = 0; x < size.Width; x++) {
for (int y = 0; y < size.Height; y++) {
array[x, y] = int.MaxValue;
}
}
return array;
}
Down to 450 ticks with the following:
public int[,] CreateArray1(Size size) {
int iX = size.Width;
int iY = size.Height;
int[,] array = new int[iX, iY];
for (int x = 0; x < iX; x++) {
for (int y = 0; y < iY; y++) {
array[x, y] = int.MaxValue;
}
}
return array;
}
Down to approximately 165 ticks after a one-time initialization of 2800 ticks. (See my answer below.) If I can get stackalloc
to work with multi-dimensional arrays, I should be able to get the same performance without having to intialize the private static
array.
private static bool _arrayInitialized5;
private static int[,] _array5;
public static int[,] CreateArray5(Size size) {
if (!_arrayInitialized5) {
int iX = size.Width;
int iY = size.Height;
_array5 = new int[iX, iY];
for (int x = 0; x < iX; x++) {
for (int y = 0; y < iY; y++) {
_array5[x, y] = int.MaxValue;
}
}
_arrayInitialized5 = true;
}
return (int[,])_array5.Clone();
}
Down to approximately 165 ticks without using the "clone technique" above. (See my answer below.) I am sure I can get the ticks lower, if I can just figure out the return of CreateArray9
.
public unsafe static int[,] CreateArray8(Size size) {
int iX = size.Width;
int iY = size.Height;
int[,] array = new int[iX, iY];
fixed (int* pfixed = array) {
int count = array.Length;
for (int* p = pfixed; count-- > 0; p++)
*p = int.MaxValue;
}
return array;
}
I am providing all code and notes regarding this question as answers. Hopefully, it will save someone's time in the future.
Arrays allocated on the Large Object Heap (LOH) are not part of this discussion. The performance improvements noted hear are only for arrays allocated on the heap.
My idea of using stackalloc
to eliminate initializing array to default values did not work out because the allocated stack memory must be copied out of the method. Meaning, I would have to create another array to hold the results. This array would be initialized defeating the whole purpose of using stackalloc
.
The CLR will only execute unsafe
code if it is in a fully trusted assembly.
Requires variables to determine if array is initialized and to store the initialized array. Performance is the same as the unsafe/fixed method after initialization. See Dan Tao's answer for possible solution.
I suck at percentages, but 300% is what I figured (500 to 165 ticks).
For this application, I settled on using the "clone" method. Following is a "lean" Generic implementation used in the application with performance samples.
Initialization:
Grid<int>
; generic clone class initalize: 4348, 4336, 4339, 4654Grid<bool>
; generic clone class initalize: 2692, 2684, 3916, 2680Grid<Color>
; generic clone class initalize: 3747, 4630, 2702, 2708Use:
Grid<int>
; generic clone class: 185, 159, 152, 290Grid<bool>
; generic clone class: 39, 36, 44, 46Grid<Color>
; generic clone class: 2229, 2431, 2460, 2496
public class Grid<T> {
private T[,] _array;
private T _value;
private bool _initialized;
private int _x;
private int _y;
public Grid(Size size, T value, bool initialize) {
_x = size.Width;
_y = size.Height;
_value = value;
if (initialize) {
InitializeArray();
}
}
private void InitializeArray() {
int iX = _x;
int iY = _y;
_array = new T[iX, iY];
for (int y = 0; y < iY; y++) {
for (int x = 0; x < iX; x++) {
_array[x, y] = _value;
}
}
_initialized = true;
}
public T[,] CreateArray() {
if (!_initialized) {
InitializeArray();
}
return (T[,])_array.Clone();
}
}
Upvotes: 5
Views: 4651
Reputation: 21684
Class and Generic using "Clone" method.
Grid<int>
; generic clone class initalize: 5344, 5334, 5693, 5272Grid<int>
; generic clone class: 187, 204, 199, 288Grid<bool>
; generic clone class initalize: 3585, 3537, 3552, 3569Grid<bool>
; generic clone class: 37, 44, 36, 43Grid<Color>
; generic clone class initalize: 4139, 3536, 3503, 3533Grid<Color>
; generic clone class: 2737, 3137, 2414, 2171[TestClass]
public class CreateArrayTest {
public class MDArray {
private bool _initialized;
private int[,] _array;
private int _x;
private int _y;
private int _value;
public MDArray(Size size, int value, bool initialize) {
_x = size.Width;
_y = size.Height;
_value = value;
if (initialize) {
InitializeArray();
}
}
private void InitializeArray() {
int iX = _x;
int iY = _y;
_array = new int[iX, iY];
for (int y = 0; y < iY; y++) {
for (int x = 0; x < iX; x++) {
_array[x, y] = _value;
}
}
_initialized = true;
}
public int[,] CreateArray() {
if (!_initialized) {
InitializeArray();
}
return (int[,])_array.Clone();
}
}
public class Grid<T> {
private T[,] _array;
private T _value;
private bool _initialized;
private int _x;
private int _y;
public Grid(Size size, T value, bool initialize) {
_x = size.Width;
_y = size.Height;
_value = value;
if (initialize) {
InitializeArray();
}
}
private void InitializeArray() {
int iX = _x;
int iY = _y;
_array = new T[iX, iY];
for (int y = 0; y < iY; y++) {
for (int x = 0; x < iX; x++) {
_array[x, y] = _value;
}
}
_initialized = true;
}
public T[,] CreateArray() {
if (!_initialized) {
InitializeArray();
}
return (T[,])_array.Clone();
}
}
[TestMethod()]
public void CreateArray_Test() {
int[,] actual;
int iHi = 10000 * 10 * 2; //'absolute minimum times array will be created (200,000)
// 'could be as high as 10000 * 10 * 20? * 50? (100,000,000?)
Stopwatch o;
o = Stopwatch.StartNew();
MDArray oMDArray = new MDArray(new Size(100, 100), int.MaxValue, true);
o.Stop();
Trace.WriteLine(o.ElapsedTicks, " MDArray; clone class initalize");
o = Stopwatch.StartNew();
for (int i = 0; i < iHi; i++) { actual = oMDArray.CreateArray(); }
o.Stop();
Trace.WriteLine(o.ElapsedTicks / iHi, " MDArray; clone class");
o = Stopwatch.StartNew();
Grid<int> oIntMap = new Grid<int>(new Size(100, 100), int.MaxValue, true);
o.Stop();
Trace.WriteLine(o.ElapsedTicks, " Grid<int>; generic clone class initalize");
o = Stopwatch.StartNew();
for (int i = 0; i < iHi; i++) { actual = oIntMap.CreateArray(); }
o.Stop();
Trace.WriteLine(o.ElapsedTicks / iHi, " Grid<int>; generic clone class");
bool[,] fActual;
o = Stopwatch.StartNew();
Grid<bool> oBolMap = new Grid<bool>(new Size(100, 100), true, true);
o.Stop();
Trace.WriteLine(o.ElapsedTicks, " Grid<bool>; generic clone class initalize");
o = Stopwatch.StartNew();
for (int i = 0; i < iHi; i++) { fActual = oBolMap.CreateArray(); }
o.Stop();
Trace.WriteLine(o.ElapsedTicks / iHi, " Grid<bool>; generic clone class");
Color[,] oActual;
o = Stopwatch.StartNew();
Grid<Color> oColMap = new Grid<Color>(new Size(100, 100), Color.AliceBlue, true);
o.Stop();
Trace.WriteLine(o.ElapsedTicks, " Grid<Color>; generic clone class initalize");
o = Stopwatch.StartNew();
for (int i = 0; i < iHi; i++) { oActual = oColMap.CreateArray(); }
o.Stop();
Trace.WriteLine(o.ElapsedTicks / iHi, " Grid<Color>; generic clone class");
}
}
Upvotes: 0
Reputation: 128327
A note on your Clone
approach: I doubt you will be able to beat it, in terms of performance. However, it could be a breaking change considering that after the first initialization, it disregards the Size
parameter and just returns an array of the same size on every call. Depending on whether that actually matters in your scenario, you could either:
Dictionary<Size, int[,]>
(I believe Size
would behave properly as a key -- haven't tested) to pre-initialize an array every time a unique Size
is requested. The overhead of this I am not sure of.Clone
idea.In case you end up having to go with 3 above, here are a few borderline ridiculous suggestions:
1. Cache your Width
and Height
properties locally, rather than accessing them from the Size
struct on each iteration.
static int[,] CreateArray(Size size) {
int w = size.Width;
int h = size.Height;
int[,] array = new int[w, h];
for (int x = 0; x < w; x++) {
for (int y = 0; y < h; y++) {
array[x, y] = int.MaxValue;
}
}
return array;
}
To create a 1000x1000 array, on my machine, this resulted in an average execution time of about 120000 ticks versus about 140000 ticks.
2. Leverage multiple cores if you have them and initialize the array in parallel.
static int[,] CreateArray(Size size) {
int w = size.Width;
int h = size.Height;
int[,] array = new int[w, h];
Action<int[,], int, int> fillFirstHalf = FillArray;
Action<int[,], int, int> fillSecondHalf = FillArray;
var firstResult = fillFirstHalf.BeginInvoke(array, 0, h / 2, null, null);
var secondResult = fillSecondHalf.BeginInvoke(array, h / 2, h, null, null);
fillFirstHalf.EndInvoke(firstResult);
fillSecondHalf.EndInvoke(secondResult);
return array;
}
static void FillArray(int[,] array, int ystart, int yend) {
int w = array.GetLength(0);
for (int x = 0; x < w; ++x) {
for (int y = ystart; y < yend; ++y) {
array[x, y] = int.MaxValue;
}
}
}
This one probably isn't a very realistic suggestion in your scenario, since it seems that you're only creating 100x100 arrays, in which case the overhead of the parallelization exceeds the performance gain. However, for creating a 1000x1000 array, I found that this approach reduced my execution times down to about 70k ticks on average (compared to the ~120k ticks I got from the first optimization I suggested).
Also, if you are creating many arrays this way, I would highly recommend parallelizing that (i.e., if you need to create a thousand arrays, create 500 each from two threads), assuming you have multiple processors to do the work for you. Without multiple processors, forget it; adding threads will only hurt your performance.
3. Get enhanced performance by using an unsafe
pointer.
Now here's an interesting discovery: it appears that a two-dimensional array in .NET is allocated in a predictable way*: basically as a one-dimensional block of memory, where each "row" is offset from the starting point by an amount equivalent to the length of all previous rows. In other words, a 10x2 array can be accessed using pointer just like a 20x1 array; a 10x10 array can be accessed like a 100x1 array, etc.
I have no idea if this is documented behavior or not. It may be an unspecified implementation detail that you don't want to depend on. Either way, it's worth looking into.
* It's possible that most other .NET developers already knew this and I'm just stating the obvious, in which case, I rescind my comment about this being "interesting".
In any case, knowledge of this allows you to exploit the fixed
keyword in an unsafe
context for a significant performance gain:
static int[,] CreateArray(Size size) {
int w = size.Width;
int h = size.Height;
int[,] array = new int[w, h];
unsafe {
fixed (int* ptr = array) {
for (int i = 0; i < w * h; ++i)
ptr[i] = int.MaxValue;
}
}
return array;
}
For initializing arrays of a signifcant size, I would even recommend combining the above approach (parallelization) with this one -- so, keep the same CreateArray
from suggestion #2, and then rewrite FillArray
as:
static void FillArray(int[,] array, int ystart, int yend) {
int w = array.GetLength(0);
unsafe {
fixed (int* p = array) {
for (int i = w * ystart; i < w * yend; ++i)
p[i] = int.MaxValue;
}
}
}
It actually seems that you already figured out this last part before I posted this, but I thought I'd include it anyway mainly for the point about combining unsafe
with parallelization.
A note on stackalloc
: I think you may be chasing after the leprechaun at the end of the rainbow with this one. According to the documentation on stackalloc
:
A block of memory of sufficient size to contain
expr
elements of typetype
is allocated on the stack, not the heap; the address of the block is stored in pointerptr
. This memory is not subject to garbage collection and therefore does not have to be pinned (viafixed
). The lifetime of the memory block is limited to the lifetime of the method in which it is defined. (emphasis mine)
This leads me to believe that you cannot return an object whose data is stored in memory allocated by stackalloc
from a function, because that memory is only allocated for the lifetime of the function.
Upvotes: 4
Reputation: 21684
I was able to get the ticks down to approximately 165. See CreateArray8
below.
I got an idea from the "Range Check Elimination" section of the CodeProject article at http://www.codeproject.com/KB/dotnet/arrays.aspx provided by jdk. (@jdk, thank you very much.) The idea was to eliminate the range checking by using a pointer and initializing each element in one loop. I was able to get the ticks down to approximately 165. Just as good as cloning without the delay of pre-initialization and the supporting static variables. (See my other answer.)
I bet I can cut this in half, if I can just figure out the return of CreateArray9
.
[TestClass]
public class CreateArrayTest {
public static unsafe int[,] CreateArray3(Size size) {
int iX = size.Width;
int iY = size.Height;
int[,] array = new int[iX, iY];
for (int x = 0; x < iX; x++) {
for (int y = 0; y < iY; y++) {
array[x, y] = int.MaxValue;
}
}
return array;
}
public unsafe static int[,] CreateArray7(Size size) {
int iX = size.Width;
int iY = size.Height;
int[,] array = new int[iX, iY];
for (int y = 0; y < iY; y++) {
for (int x = 0; x < iX; x++) {
array[x, y] = int.MaxValue;
}
}
return array;
}
public unsafe static int[,] CreateArray8(Size size) {
int iX = size.Width;
int iY = size.Height;
int[,] array = new int[iX, iY];
fixed (int* pfixed = array) {
int count = array.Length;
for (int* p = pfixed; count-- > 0; p++)
*p = int.MaxValue;
}
return array;
}
public unsafe static int[,] CreateArray9(Size size) {
int iX = size.Width;
int iY = size.Height;
void* array = stackalloc int[iX * iY];
int count = iX * iY;
for (int* p = (int*)array; count-- > 0; p++)
*p = int.MaxValue;
//return (int[,])array; //how to return?
return new int[1, 1];
}
[TestMethod()]
public void CreateArray_Test() {
int[,] actual;
int iHi = 10000 * 10 * 2;
//'absolute minimum times array will be created (200,000)
//'could be as high as 10000 * 10 * 20? * 50? (100,000,000?)
Stopwatch o;
o = Stopwatch.StartNew();
for (int i = 0; i < iHi; i++) { actual = CreateArray3(new Size(100, 100)); }
o.Stop();
Trace.WriteLine(o.ElapsedTicks / iHi, "CreateArray3; static unsafe");
o = Stopwatch.StartNew();
for (int i = 0; i < iHi; i++) { actual = CreateArray7(new Size(100, 100)); }
o.Stop();
Trace.WriteLine(o.ElapsedTicks / iHi, "CreateArray7; static unsafe for(y,x)");
o = Stopwatch.StartNew();
for (int i = 0; i < iHi; i++) { actual = CreateArray8(new Size(100, 100)); }
o.Stop();
Trace.WriteLine(o.ElapsedTicks / iHi, "CreateArray8; static unsafe pointer single_for");
}
}
Upvotes: 0
Reputation: 10038
This is untested in this scenario, but has worked in similar ones. Sometimes, swapping the order of the array traversal speeds things up due to memory locality.
In other words, instead of doing for(x) ... for(y)
do instead for(y) ... for(x)
.
Upvotes: 1
Reputation: 21684
Adding static
and unsafe
provide some reduction in Ticks. Following are some samples.
I tried to use stackalloc
. My idea was to allocate the array, which would not be initialized because it is unsafe
code. I would then zip down the array, initalizing to int.MaxValue
as I go, then Clone
the array for the return result. But, I got stumped on the multi-dimensional declaration.
Then I remembered using Clone
on arrays in another project. Each Array.Clone
saved several seconds. Based on this idea, I created the following version of the CreateArray
routine getting excellent results.
Now, all I need to do is get stackalloc
to work with multi-dimensional arrays.
CreateArray5; clone: 157, 172
private static bool _arrayInitialized5;
private static int[,] _array5;
public static int[,] CreateArray5(Size size) {
if (!_arrayInitialized5) {
int iX = size.Width;
int iY = size.Height;
_array5 = new int[iX, iY];
for (int x = 0; x < iX; x++) {
for (int y = 0; y < iY; y++) {
_array5[x, y] = int.MaxValue;
}
}
_arrayInitialized5 = true;
}
return (int[,])_array5.Clone();
}
int[,] actual;
int iHi = 10000 * 10 * 2;
//'absolute minimum times array will be created (200,000)
//'could be as high as 10000 * 10 * 20? * 50? (100,000,000?)
Stopwatch o;
//'pre-initialize
o = Stopwatch.StartNew();
actual = CreateArray5(new Size(100, 100));
o.Stop();
Trace.WriteLine(o.ElapsedTicks, "CreateArray5; pre-initialize");
o = Stopwatch.StartNew();
for (int i = 0; i < iHi; i++) { actual = CreateArray5(new Size(100, 100)); }
o.Stop();
Trace.WriteLine(o.ElapsedTicks / iHi, "CreateArray5; static unsafe clone");
Upvotes: 2
Reputation: 39916
You can use each row in parallel by using parallel library to initialize it, it will be faster.
However, I think there is limitation to this For, only 64 operations can be queued, but in that case you can queue 0 to 63, 64 to 127 etc.. in Parallel.For ..
public int[,] CreateArray(Size size) {
int[,] array = new int[size.Width, size.Height];
System.Threading.Paralle.For (0,size.Width,
x=>{
for (int y = 0; y < size.Height; y++) {
array[x, y] = int.MaxValue;
}
}
);
return array;
}
This is included in .NEt 4, however for .NET 3.51 you can get same library from codeplex.
Upvotes: 0
Reputation: 28869
Create the array in C# unmanaged (unsafe) mode like shown here[code project] and initialize it manually.
In C# managed mode the array first initializes all elements to the default value before your loop starts to populate it. You can likely cut out the doubling up in unmanaged mode. This would save a lot of time.
Upvotes: 3