Reputation: 7512
Is there any overhead to using a struct shim to replace a jagged array?
To give a specific example
vertices = new KeyValuePair<uint, EdgeData>[][];
vs
private struct Vertex
{
public KeyValuePair<uint, EdgeData>[] Arcs { get; set; }
}
vertices = new KeyValuePair<uint, Vertex>[];
EdgeData is a class if it makes any difference
Obviously the intent is clearer in the struct example but it needs to be able to hold massive graphs so any memory overhead is significant
Upvotes: 1
Views: 921
Reputation: 133950
Replacing the 2D array with a 1D array of structs will not cause any issues. It's really a matter of how you view the data. If it makes more sense to model it as an array of structures, each of which contains an array of arcs, then that's the way you should express it in code.
There are some small differences in the way that they're stored. In particular, your 1D array approach will occupy more memory, total, than the 2D array approach. Basically, you have an extra uint
for each row.
Following struck. It's discussing the difference between the struct approach and a 2D array (i.e. [,]
), rather than the jagged array ([][]
) that the OP is using.
Actually, the total memory used would be more than that. In the 2D array approach, you have (row * col) KeyValuePair
structures in an array. The array has an allocation overhead of about 50 bytes in the 64-bit runtime (about 40 bytes, if I recall, in the 32-bit runtime). In the 1D array approach, you still have (row * col) KeyValuePair
structures, but each one contains an array that has the same 50 byte allocation overhead. In addition, you have the vertices
array, that contains (row) KeyvaluePair
structures.
However, your 2D array (just the array) would require (rows * cols * (4 + sizeof(IntPtr)))
bytes. The 1D vertices
array will only require (rows * (4 + sizeof(IntPtr)))
bytes. If you're limited to 2 gigabytes for a single array (as you are in .NET 4.0 and earlier, or in .NET 4.5 unless you enable very large objects), then you can potentially have more items, total, using the 1D array of structs than with the 2D array. Assuming, of course, that you have enough memory to hold that many KeyValuePair<uint, EdgeData>
instances.
So your overall memory usage will increase, but your largest single allocation will be much smaller.
Upvotes: 2
Reputation: 81115
Arrays of structures tend to be rather efficient, though in your particular example you have an extra uint for each row. Also, avoid exposing properties of structure types, and if a structure represents a collection of independent values bound together with duct tape (e.g. the coordinates of a point), simply expose those items as fields. While there are many situations in which the JIT will convert property accesses to field accesses, there are also many where it cannot.
If one is comparing the efficiency of:
struct FloatPoint2D {public float X,Y;}
FloatPoint3D[] MyArray;
versus
float[] MyXCoords, MyYCoords;
Accessing both the X and Y of items in random sequence will be faster with the struct defined above than with a pair of separate arrays (typically one cache miss instead of two) but accessing just the X or just the Y coordinate of many items in sequence will be faster if one is using separate arrays (twice as many useful coordinates will be fetched with each cache line).
In your particular example, it's unclear exactly what data your type needs to encapsulate; your struct and non-struct examples hold different data, so it's hard to say one is "more efficient".
Upvotes: 2
Reputation: 74177
A struct
may or may not be allocated on the stack. Reference types can never be allocated on the stack; they are always allocated on the heap.
From the Standard (ISO 23270), § 8.8:
8.8 Structs The list of similarities between classes and structs is long—structs can implement interfaces, and can have the same kinds of members as classes. Structs differ from classes in several important ways, however: structs are value types rather than reference types, and inheritance is not supported for structs. Struct values are stored “on the stack” or “in-line”. Careful programmers can sometimes enhance performance through judicious use of structs.
For example, the use of a struct rather than a class for a Point can make a large difference in the number of memory allocations performed at run time. The program below creates and initializes an array of 100 points.
With
Point
implemented as a class, 101 separate objects are instantiated—one for the array and one each for the 100 elements.class Point { public int x, y; public Point(int x, int y) { this.x = x; this.y = y; } } class Test { static void Main() { Point[] points = new Point[100]; for (int i = 0; i < 100; i++) { points[i] = new Point(i, i*i); } }
If
Point
is instead implemented as a struct, as instruct Point { public int x, y; public Point(int x, int y) { this.x = x; this.y = y; } }
only one object is instantiated—the one for the array. The Point instances are allocated in-line within the array. This optimization can be misused. Using structs instead of classes can also make an application run slower or take up more memory, as passing a struct instance by value causes a copy of that struct to be created.
So the answer is "Maybe".
For your example, wrapping an array (a reference type) within a struct
(a value type), doesn't mean anything: that array is still allocated on the heap.
If however, you change your class EdgeData
to a struct, it can be (but may not be) allocated in-line within the array. So if your EdgeData
class is, for instance, 16 bytes in size, and you create and populate an EdgeData[]
of 100 entries, you are actually allocating 1 array instance (with a backing store sized to hold 100 object references, and 100 individual instances of your EdgeData
class.
If EdgeData
is a struct, you allocate 1 array with a backing store sized to hold 100 EdgeData
instances (in this case, 1600 bytes, since our hypothetical EdgeData
struct is 16 bytes in size.)
Iterating over the class version of the array, especially if the array is very large, may cause paging, since you are probably losing locality of reference as you jump all over the heap to hit the individual EdgeData
instances.
Iteration over the struct
version of the array preserves locality of reference, since the EdgeData
instances are in-line.
Upvotes: 4