How to detect .NET WPF memory leak or GC long run?

I have the next very strange situation and problem:

PROBLEM: It Freezes/Hangs after seconds or minutes of usage in some slow/weak laptos/netbooks of my beta testers (under 2GHz of speed and under 2GB of RAM). I was thinking of a memory leak, but...

EDIT 1: Also, there is the case that the memory usage grows and grows until collapse (only in slow machines).

EDIT 2: The problem is worse when the system has some other heavy programs running (like Visual Studio, Office and Web pages open). Not even the first symbol of diagram can be created while the memory usage takes off like a rocket to space (hundreds of MBs consumed in seconds). Anyone with similar experiences? what were they strategies?

So, I really have a big trouble because I cannot reproduce the error, only see these strange behaviour (mem jumps), and the tool supposed to show me what is happening is hiding the problem (like the "observer's paradox").

Any ideas on what's happening and how to solve it?

EDIT 3: This screenshot of the Ants memory profiler shows that the huge consumption of ram (in crescendo) if from unmanaged resources.

enter image description here

But, what can be consuming so much memory, so fast??!!!

Upvotes: 6

Views: 3524

Answers (3)

Hans Passant
Hans Passant

Reputation: 941287

What you describe is all entirely normal behavior for a .NET program, there is no indication that there's anything wrong with your code.

By far the biggest issue is that TaskMgr.exe is just not a very good program to tell you what's happening in your process. It displays the "working set" for a process, a number that has very little to do with the amount of memory the process uses.

Working set is the amount of RAM that your process uses. Every process gets 2 gigabytes of virtual memory to use for code and data. Even on your virtual XP box with only 512 MB of RAM. All of those processes however have only a set amount of RAM to work with. On a lowly machine that can be as little as a gigabyte.

Clearly having multiple processes running, each with gigabytes of virtual memory with only a gigabyte of real memory takes some magic. That's provided by the operating system, Windows virtualizes the RAM. In other words, it creates the illusion for each process that it is running by its own on a machine with 2 gigabytes of RAM. This is done by a feature called paging, whenever a process needs to read or write memory, the operating system grabs a chunk of RAM to provide the physical memory.

Inevitably, it has to take away some RAM from another process so that it can be made available to yours. Whatever was previously in that chunk of RAM needs to be preserved. That's what the paging file does, it stores the content of RAM that was paged out.

Clearly this does not come for free, disks are pretty slow and paging is an expensive operation. That's why lowly machines perform poorly when you ask them to run several large programs. The real measure for this is also visible in TaskMgr.exe but you have to add it. View + Select Columns and tick "Page fault delta". Observe this number while your process runs. When you see it spike, you can expect your program to slow down a great deal and the displayed memory usage to change rapidly.

Addressing your observations:

creating objects ... the TaskManager show, as expected, some memory usage "jumps"

Yes, you are using RAM so the working set goes up.

These mem-usage "jumps" also remains executing AFTER user interaction ended

No slam dunk, but other processes get more time to execute, using RAM in turn and bumping some of yours out. Check the Page fault delta column.

I've used the Ants mem profiler, but somewhat it prevents those "jumps" to happen after user interaction.

Yes, memory profilers focus on real memory usage of your program, the virtual memory kind. They largely ignore working set, there isn't anything you can do about it and the number is meaningless because it really depends on what other processes are running.

there is the case that the memory usage grows and grows until collapse

That can be a side-effect of the garbage collector but that isn't typical. You are probably just seeing Windows trimming your working set, chucking out pages so you don't consume too many.

In a Windows XP Mode machine (VM in Win 7) with only 512MB of RAM Assigned it works fine

That's likely because you haven't installed any large programs on that WM that would compete for RAM. XP was also designed to work well on machines with very little memory, it is smooth on a machine with 256 MB. That's most definitely not the case for Vista/Win7, they were designed to take advantage of modern machine hardware. A feature like Aero is nice eye candy but very expensive.

The problem is worse when the system has some other heavy programs running

Yes, you are competing with those other processes needing lots of RAM.

Not even the first symbol of diagram can be created while the memory usage takes off like a rocket

Yes, you are seeing pages getting mapped back to RAM, getting reloaded from the paging file and the ngen-ed .ni.dll files. Rapidly increasing your working set again. You'll also see the Page fault delta number peaking.

So concluding, your WPF program just consumes a lot of memory and needs the horse power to operate well. That's not easy to fix, it takes a pretty drastic redesign to lower the resource requirements. So just put the system requirements on the box, it is entirely normal to do so.

Upvotes: 6

Dan Busha
Dan Busha

Reputation: 3803

It's hard to know exactly what's going on without seeing your code, but here are a few suggestions:

First, some information about the Garbage Collector. The most important thing to know is that the GC is non-deterministic, you can't know when it will run. Even calling GC.Collect() is only a suggestion, not a command. The Gen0 heap is for locally scoped objects and is collected frequently. If an object survives Gen0 collection it will be moved to the Gen1 heap. After a bit the Gen1 heap will be collected and if an object survives collection it is moved to the Gen2 heap which is less frequently collected. For this reason it is possible to see a saw-tooth pattern in a memory graph if you are allocating a lot of objects that make it into the Gen1 or Gen2 heap.

Using a tool like Process Explorer examine the size of the managed heaps (Gen0, Gen1, Gen2 and the large object heap) and find out where the memory is being held. If you've got lots of short lived objects (Gen1 heap) think of a way to reuse memory instead of re-allocating - something like an object pool works well for that.

Also, try comparing the total size of the managed heaps against total private bytes of your application. Private bytes includes both managed and unmanaged memory allocated by your application. If there is a big difference between the size of the managed heaps and the private bytes it's likely that your application is allocating unmanaged objects (via graphic objects, streams, etc) that aren't getting disposed correctly. Look for objects that implement IDisposable, but don't have Dispose() getting called.

Another problem could be heap fragmentation. If you're application is allocating large objects which it can't fit into the current heap it will request more memory from the OS. A solution to avoid this is to allocate smaller chunks or memory or to allocate them in sequential blocks rather than randomly (think array vs linked list). A tool like the ANTS Memory profiler should be able to tell you that this is taking place.

@ReedCopsey recommendation of PerfView (or it's predecessor the CLR Profiler) is a good one and will give you a better idea how your memory is being allocated.

Upvotes: 2

Reed Copsey
Reed Copsey

Reputation: 564363

This would suggest that you're likely creating a lot of "garbage" - basically, creating and letting many objects go out of scope quickly, but which take long enough to get into Gen1 or Gen2. This puts a large burden on the GC, which in turn can cause freezes and hangs on many systems.

To see what is going on, I've used the Ants mem profiler, but somewhat it prevents those "jumps" to happen after user interaction.

The reason this profiler (ANTS), specifically, could mask this behavior is that it forces a full GC every time you take a memory snapshot. This would make it look like there is no memory "leak" (as there isn't), but does not show the total memory pressure on the system.

A tool like PerfView could be used to investigate the GC's behavior during the runtime of the process. This would help you track the number of GCs that occur, and your application state at that point in time.

Upvotes: 3

Related Questions