lhiapgpeonk
lhiapgpeonk

Reputation: 457

C# Where does the memory overhead come from

I'm having a little resource problem here. It seems that .NET is creating an aweful lot of memory overhead and/or doesn't release memory it shouldn't need. But to the problem:

I have an object which reads a STL file of the following class:

public class cSTLBinaryDataModel
{
    public byte[] header { get; private set; }
    public UInt32 triangleCount { get { return Convert.ToUInt32(triangleList.Count); } }
    public List<cSTLTriangle> triangleList { get; private set; }

    public  cSTLBinaryDataModel()
    {
        header = new byte[80];
        triangleList = new List<cSTLTriangle>();
    }

    public void ReadFromFile(string in_filePath)
    {
        byte[] stlBytes;
//Memory logpoint 1
        stlBytes = File.ReadAllBytes(in_filePath);
//Memory logpoint 2
        ReadHeader(stlBytes.SubArray(0, cConstants.BYTES_IN_HEADER));
        ReadTriangles(stlBytes.SubArray(cConstants.BYTES_IN_HEADER, stlBytes.Length - cConstants.BYTES_IN_HEADER));
//Evaluate memory logpoints here
    }

    private void ReadHeader(byte[] in_header)
    {
        header = in_header;
    }

    private void ReadTriangles(byte[] in_triangles)
    {
        UInt32 numberOfTriangles = BitConverter.ToUInt32(cHelpers.HandleLSBFirst(in_triangles.SubArray(0, 4)), 0);
//Memory logpoint 3
        for (UInt32 i = 0; i < numberOfTriangles; i++)
        {
            triangleList.Add(new cSTLTriangle(in_triangles.SubArray(Convert.ToInt32(i * cConstants.BYTES_PER_TRIANGLE + 4), Convert.ToInt32(cConstants.BYTES_PER_TRIANGLE))));
        }
//Memory logpoint 4
    }
}

My STL file is quite big (but can get even bigger); it contains 10533050 triangles, so it's roughly 520 MB in size on disk. The class cSTLTriangle which is added to triangleList is the following:

public class cSTLTriangle
{
    public cVector normalVector { get; private set; }
    public cVector[] vertices { get; private set; }
    public UInt16 attributeByteCount { get; private set; }
    public bool triangleFilledWithExternalValues { get; private set; }

    public cSTLTriangle(byte[] in_bytes)
    {
        Initialize();
        normalVector = new cVector(BitConverter.ToSingle(cHelpers.HandleLSBFirst(in_bytes.SubArray(0, 4)), 0),
            BitConverter.ToSingle(cHelpers.HandleLSBFirst(in_bytes.SubArray(4, 4)), 0),
            BitConverter.ToSingle(cHelpers.HandleLSBFirst(in_bytes.SubArray(8, 4)), 0));
        vertices[0] = new cVector(BitConverter.ToSingle(cHelpers.HandleLSBFirst(in_bytes.SubArray(12, 4)), 0),
            BitConverter.ToSingle(cHelpers.HandleLSBFirst(in_bytes.SubArray(16, 4)), 0),
            BitConverter.ToSingle(cHelpers.HandleLSBFirst(in_bytes.SubArray(20, 4)), 0));
        vertices[1] = new cVector(BitConverter.ToSingle(cHelpers.HandleLSBFirst(in_bytes.SubArray(24, 4)), 0),
            BitConverter.ToSingle(cHelpers.HandleLSBFirst(in_bytes.SubArray(28, 4)), 0),
            BitConverter.ToSingle(cHelpers.HandleLSBFirst(in_bytes.SubArray(32, 4)), 0));
        vertices[2] = new cVector(BitConverter.ToSingle(cHelpers.HandleLSBFirst(in_bytes.SubArray(36, 4)), 0),
            BitConverter.ToSingle(cHelpers.HandleLSBFirst(in_bytes.SubArray(40, 4)), 0),
            BitConverter.ToSingle(cHelpers.HandleLSBFirst(in_bytes.SubArray(44, 4)), 0));
        attributeByteCount = BitConverter.ToUInt16(cHelpers.HandleLSBFirst(in_bytes.SubArray(48, 2)), 0);
        triangleFilledWithExternalValues = true;
    }

    public cSTLTriangle(cVector in_vertex1, cVector in_vertex2, cVector in_vertex3)
    {
        Initialize();
        vertices[0] = in_vertex1;
        vertices[1] = in_vertex2;
        vertices[2] = in_vertex3;
        normalVector = cVectorOperations.CrossProduct(cVectorOperations.GetDirectionVector(vertices[0], vertices[1]), cVectorOperations.GetDirectionVector(vertices[0], vertices[2]));
    }
    /// <summary>
    /// Resets object to a defined state
    /// </summary>
    private void Initialize()
    {
        vertices = new cVector[3];
        //from here on not strictly necessary, but it helps with resetting the object after an error
        normalVector = new cVector(0, 0, 0);
        vertices[0] = new cVector(0, 0, 0);
        vertices[1] = new cVector(0, 0, 0);
        vertices[2] = new cVector(0, 0, 0);
        attributeByteCount = 0;
        triangleFilledWithExternalValues = false;
    }
}

With the class cVector being: (Sorry for this much code)

public class cVector:ICloneable
{
    public float component1 { get; set; }
    public float component2 { get; set; }
    public float component3 { get; set; }
    public double Length { get { return Math.Sqrt(Math.Pow(component1, 2) + Math.Pow(component2, 2) + Math.Pow(component3, 2)); } }

    public cVector(float in_value1, float in_value2, float in_value3)
    {
        component1 = in_value1;
        component2 = in_value2;
        component3 = in_value3;
    }

    public object Clone()
    {
        return new cVector(component1, component2, component3);
    }
}

If I count what sizes the used types in my classes have, it amounts to 51 bytes for one instance of cSTLTriangle. I am aware that there has to be an overhead to accomodate functions and such. But, if I multiply this size by the number of triangles, I end up at 512,3 MB, which is quite in tune with the actual file size. I would imagine the triangleList takes up roughly the same amount of memory (again allowing for slight overhead, it's a List<T> nontheless), but no! (Using GC.GetTotalMemory(false) to evaluate memory)

From Logpoint 1 to Logpoint 2, there is an increase by 526660800 bytes, this is quite accurately the size of the STL file which is loaded into the byte array. Between Logpoint 3 and Logpoint 2 there is an increase of roughly the same amount, understandable, because I pass a subarray to the ReadTriangles method. The SubArray is code I found here on SO (could this be the devil in desguise?):

public static T[] SubArray<T>(this T[] data, int index, int length)
{
    T[] result = new T[length];
    Array.Copy(data, index, result, 0, length);
    return result;
}

Things get ridiculous at the next Logpoint. Between Logpoint 4 and Logpoint 3 there is an increase in memory usage of about roughly 4.73 times the size of the original STL file (As you can see, I make heavy use of .SubArray while parsing each triangle).

When I let the program continue, there is no significant increase in memory usage: good, but also no decrease at all: bad. I would expect the byte[] holding the file to release memory, since it goes out of scope, as does the sub array I passed to ReadTriangles(byte[] ...), but somehow they don't. And I end up with an "overhead" of 5.7 times the size of my raw STL data.

Is this usual behaviour? Does the .NET runtime keep memory allocated (even if it has been extended to disk), just like Photoshop does, once it got hold of some jucy RAM? How can I reduce the memory footprint of this combination of classes?

EDIT:

Upvotes: 3

Views: 609

Answers (3)

Chris
Chris

Reputation: 5514

Memory overhead

Your cVector class adds alot of memory overhead. On a 32-bit system, any reference object has a memory overhead of 12 bytes (although 4 of those are free to be used by fields if possible), if I recall correctly. Let's go with an overhead of 8 bytes. So in your case with 10,000,000 triangles, each containing 4 vectors, that adds upp to:

10,000,000 * 4 * 8 = 305 MB of overhead

If you're running on a 64-bit system it's twice that:

10,000,000 * 4 * 16 = 610 MB of overhead

On top of this, you also have the overhead of the four references each cSTLTriangle will have to the vectors, giving you:

10,000,000 * 4 * 4 = 152 MB (32-bit)

10,000,000 * 4 * 8 = 305 MB (64-bit)

As you can see this all builds up to quite a hefty bit of overhead.

So, in this case, I would suggest you make cVector a struct. As discussed in the comments, a struct can implement interfaces (as well as properties and methods). Just be aware of the caveats that @Jcl mentioned.

You have the same issue with your cSTLTriangle class (about 76/152 MB overhead for 32-bit and 64-bit, respectively), although at its size I'm not sure I want to recommend going with struct on that. Others here might have useful insights on that matter.

Additionally, due to padding and object layout, the overhead might actually be even larger, but I haven't taken that into account here.

List capacity

Using the List<T> class with that amount of objects can cause some wasted memory. As @Matthew Watson mentions, when the list's internal array has no more room, it will be expanded. In fact, it will double it's capacity every time that happens. In a test with your number of 10533050 entries, the capacity of the list ended up at 16777216 entries, giving an overhead of:

( 16777216 - 10533050 ) * 4 byte reference = 23 MB (32-bit)

( 16777216 - 10533050 ) * 8 byte reference = 47 MB (64-bit)

So since you know the number of triangles in advance, I would recommend just going with a simple array. Manually setting the Capacity of a list works too.

Other issues

The other issues that have been discussed in the comments should not give you any memory overhead, but they sure will put alot of unnecessary pressure on the GC. Especially the SubArray method which, while very practical, will create many millions of garbage arrays for the GC to handle. I suggest skipping that and indexing into the array manually, even if it's more work.

Another issue is reading the entire file at once. This will be both slower and use more memory than reading it piece by piece. Directly using a BinaryReader as others have suggested might not be possible due to the endianness issues you need to deal with. One complicated option could be to use memory mapped files, that would let you access the data without having to care about if it's been read or not, leaving the details to the OS.

(man I hope I got all these numbers right)

Upvotes: 3

Matthew Watson
Matthew Watson

Reputation: 109597

There are a couple of things you can try to decrease memory usage.

Firstly, if possible you should rewrite your file loading code so that it only loads the data it needs rather than loading the whole file at once.

For example, you could read the header as a single block, and then read the data for each triangle as a single block (in a loop).

Secondly, it's possible that your large object heap is suffering from fragmentation - and the garbage collector doesn't move large objects, so it can't be defragmented. (This issue if fixed for .Net 4.51, but you have to explicitly enable large object heap defragmentation, and instigate it explicitly.)

You may be able to mitigate this problem by pre-sizing your triangleList.

At the moment, you add each triangle to triangleList in turn, starting with a list with zero capacity. This means that every so often the list's capacity will be exceeded, causing it to be expanded.

When the list is expanded by adding an item to it when it's at capacity, it:

  • Creates a new internal buffer twice the size of the current buffer.
  • Copies the old buffer to the new one.
  • Deletes the old buffer.
  • Copies the new item to the new buffer.

The problem where is twofold:

  1. A lot of redundant copying is going on.
  2. If the internal buffer exceeds the threshold for putting objects on the large object heap, you might be getting heap fragmentation.

Since you know in advance the maximum size of the triangle list you can solve this issue by setting the list's capacity before adding items to it:

triangleList.Capacity = numberOfTriangles;

Upvotes: 2

Joe
Joe

Reputation: 173

After logpoint 2 maybe you could try splitting out the code a bit so that you have a

byte[] header
byte[] triangles

and once you're done splitting the original byte array set it to null and then you can use System.GC.Collect() to force the garbage collector to run. This should save you a bit of memory.

 public void ReadFromFile(string in_filePath)
    {
        byte[] stlBytes;
//Memory logpoint 1
        stlBytes = File.ReadAllBytes(in_filePath);
//Memory logpoint 2
        byte[] header = stlBytes.SubArray(0, cConstants.BYTES_IN_HEADER);
        byte[] triangles = stlBytes.SubArray(cConstants.BYTES_IN_HEADER, stlBytes.Length - cConstants.BYTES_IN_HEADER);
        ReadHeader(header);
        ReadTriangles(triangles);
        stlBytes = null;
        System.GC.Collect();
//Evaluate memory logpoints here
    }

Upvotes: 0

Related Questions