ths
ths

Reputation: 2942

collection Clear() vs new, GC impact

i create a bunch of MemoryStreams in a loop and add them to a collection (an ArrayList in this case).
afterwards i iterate over this list and process thise streams. because i ran into Outofmemoryexceptions, i decided to instead periodically process the list, and free it.

doing this via list = new ArrayList() however did nothing to change the memoryconsumption, neither when monitoring it, nor in eliminating the Outofmemoryexceptions. even calling GC.Collect() didn't change that. i noticed that the memory was only freed after leaving the scope.

calling List.Clear()however, immediately freed the memory and the loop worked as expected.

so, why this difference? a number of other topics here leave the impression that the two methods should be essentially the same, with list = new ArrayList() being possibly more efficient, since Clear() is a O(n) operation.

i'm pretty sure that there are no other references to my memorystreams extant (i basically do list.Add(new MemoryStream(...))

Upvotes: 2

Views: 184

Answers (3)

Hans Passant
Hans Passant

Reputation: 942318

Well, there is a difference. ArrayList.Clear() sets all the elements to null. Which makes those elements immediately eligible for collection.

If you reallocate the ArrayList then it matters exactly when the original ArrayList gets collected. Only then will the elements get collected as well. If the original ArrayList is large (more than 7083 items) then its underlying array is going to end up in the Large Object Heap. Which doesn't get collected very often. So the elements stay around for a while as well. Increasing the odds for OOM.

You ought to look at the big picture here, your program is teetering on the edge of still being able to do its job. That rarely gets better over time. You'll need to seriously consider a drastic rewrite, one that, say, halves the VM usage so you'll have some breathing room for a while. Or take the drastically simple solution. Flip the switch and target a 64-bit operating system. Widely available today.

Upvotes: 3

ths
ths

Reputation: 2942

following @user2864740 's comment, i wrote a little test routine and they're right: the effect does only appear in Debug mode. Furthermore, only if i new the list at the end of the loop, not when the same statement is moved to the beginning:

    static void Main(string[] args)
    {
        using (StreamWriter w = new StreamWriter(@"d:\tststream.123", false, Encoding.Default))
            for (int i = 0; i < (1 << 20); i++) 
                w.WriteLine(Guid.NewGuid());

        List<MemoryStream> list = new List<MemoryStream>();
        for (int j = 0; j < 100; j++)
        {
            for (int i = 0; i < 30; i++)
            {
                list.Add(new MemoryStream(File.ReadAllBytes(@"d:\tststream.123")));
            }
            list = new List<MemoryStream>();
            Console.WriteLine(j.ToString());
        }
    }

this throws outofmemory (in 32bit, of course) when compiled in debug mode, on the second iteration. compile it in release or move the list = new List<MemoryStream>(); to the beginning of the loop, and it continues "indefinitely".

Upvotes: 0

usr
usr

Reputation: 171246

In debug mode the JIT extends the lifetime of all local variables (physically: stack locations and registers holding references) to the end of the function. You can see random object lifetime extensions.

This behavior does not violate GC guarantees. A GC which never deletes anything is a valid GC, albeit not a useful one.

Explicitly clearing variables to null and factoring out functions can help here.

When you overwrite a reference variable with list = new ArrayList() there might still be other object references to the old list. They might be explicit somewhere in your code, or just random local variables that happen to still hold the old reference but are unused.

Closures are also prone to capture references.

Upvotes: 1

Related Questions