Reputation: 839
I have a Stateful Service which consumes increasing amount of memory until the service is restarted or process is killed in which then releases the memory.
See the diagram below; it looks like the typical saw-tooth type problem of memory leaking somewhere.
We used DotMemory to run some analysis of the memory usage of a single node within the cluster and it reported the vast majority of the memory being consumed was in unmanaged memory.
Just before we cycled the stateful service we took a memory dump file to see if we could learn anything further using WinDbg.
I'm no WinDbg expert but I followed this article which seemed to suggest that most of the memory was being consumed by Heap Stack (http://hacksoflife.blogspot.co.uk/2009/06/heap-debugging-memoryresource-leak-with.html)
It suggested that I should use some additional commands to get a stack trace but I didn't do this prior to taking the dmp file (gflags.exe /i yourApplication.exe +ust).
Is there anyone who could help me diagnose the issue using the dmp file I have ?
Could someone validate that the steps mentioned in the article would be worth following to try to find this issue ?#
Has anyone experience this kind of issue with Stateful Services before ?
ADDITIONAL INFO:
Here is an image of the inspections report from DotMemory, on the object leak inspection I need to re-check the code I don't remember us instantiating those objects in our code.
Here is the output I got from running !address -summary
0:000> !address -summary
Mapping file section regions...
Mapping module regions...
Mapping PEB regions...
Mapping TEB and stack regions...
Mapping heap regions...
Mapping page heap regions...
Mapping other regions...
Mapping stack trace database regions...
Mapping activation context regions...
--- Usage Summary ---------------- RgnCount ----------- Total Size -------- %ofBusy %ofTotal
Free 865 7ff7`4c03d000 ( 127.966 TB) 99.97%
Heap 85669 6`e0f8e000 ( 27.515 GB) 79.04% 0.02%
<unknown> 21045 1`962b3000 ( 6.346 GB) 18.23% 0.00%
Stack 559 0`2f8c0000 ( 760.750 MB) 2.13% 0.00%
Image 1156 0`0d170000 ( 209.438 MB) 0.59% 0.00%
Other 9 0`001c7000 ( 1.777 MB) 0.00% 0.00%
TEB 189 0`0017a000 ( 1.477 MB) 0.00% 0.00%
PEB 1 0`00001000 ( 4.000 kB) 0.00% 0.00%
--- Type Summary (for busy) ------ RgnCount ----------- Total Size -------- %ofBusy %ofTotal
MEM_PRIVATE 106665 8`a403b000 ( 34.563 GB) 99.28% 0.03%
MEM_IMAGE 1906 0`0ecd1000 ( 236.816 MB) 0.66% 0.00%
MEM_MAPPED 57 0`012a7000 ( 18.652 MB) 0.05% 0.00%
--- State Summary ---------------- RgnCount ----------- Total Size -------- %ofBusy %ofTotal
MEM_FREE 865 7ff7`4c03d000 ( 127.966 TB) 99.97%
MEM_COMMIT 96056 7`718c6000 ( 29.774 GB) 85.53% 0.02%
MEM_RESERVE 12572 1`426ed000 ( 5.038 GB) 14.47% 0.00%
--- Protect Summary (for commit) - RgnCount ----------- Total Size -------- %ofBusy %ofTotal
PAGE_READWRITE 53162 7`56794000 ( 29.351 GB) 84.31% 0.02%
PAGE_NOACCESS 41097 0`0a089000 ( 160.535 MB) 0.45% 0.00%
PAGE_EXECUTE_READ 120 0`08948000 ( 137.281 MB) 0.39% 0.00%
PAGE_READONLY 760 0`05996000 ( 89.586 MB) 0.25% 0.00%
PAGE_EXECUTE_READWRITE 423 0`01a3f000 ( 26.246 MB) 0.07% 0.00%
PAGE_WRITECOPY 259 0`00f4c000 ( 15.297 MB) 0.04% 0.00%
PAGE_READWRITE|PAGE_GUARD 185 0`0033b000 ( 3.230 MB) 0.01% 0.00%
PAGE_EXECUTE_WRITECOPY 50 0`00105000 ( 1.020 MB) 0.00% 0.00%
--- Largest Region by Usage ----------- Base Address -------- Region Size ----------
Free 27e`d5230000 7d77`29c10000 ( 125.465 TB)
Heap 276`50c95000 0`00d3a000 ( 13.227 MB)
<unknown> 275`81a2d000 0`1e5d3000 ( 485.824 MB)
Stack 14c`1f200000 0`00800000 ( 8.000 MB)
Image 7ffc`5d014000 0`01083000 ( 16.512 MB)
Other 275`bf920000 0`00181000 ( 1.504 MB)
TEB f9`09a04000 0`00002000 ( 8.000 kB)
PEB f9`09bf0000 0`00001000 ( 4.000 kB)
Here is the output I get from !heap -s
0:000> !heap -s
************************************************************************************************************************
NT HEAP STATS BELOW
************************************************************************************************************************
LFH Key : 0x8b79585e7994c063
Termination on corruption : ENABLED
Heap Flags Reserv Commit Virt Free List UCR Virt Lock Fast
(k) (k) (k) (k) length blocks cont. heap
-------------------------------------------------------------------------------------
00000275bf510000 00000002 28701700 28609240 28701500 189846 42173 1804 5 2456d24 LFH
Lock contention 38104356
00000275bf3b0000 00008000 64 4 64 2 1 1 0 0
00000275bf780000 00001002 1280 368 1080 100 8 2 0 0 LFH
00000275bf710000 00001002 1280 388 1080 109 7 2 0 0 LFH
00000275bfc80000 00001002 1280 264 1080 7 9 2 0 0 LFH
00000275bfe70000 00041002 60 8 60 5 1 1 0 0
00000275d8730000 00041002 260 68 60 14 2 1 0 0 LFH
00000275d89a0000 00001002 31792 15028 31592 3404 244 14 3 106 LFH
External fragmentation 22 % (244 free blocks)
00000275d8950000 00001002 80356 19512 80156 17801 91 36 0 22 LFH
External fragmentation 91 % (91 free blocks)
00000275d8930000 00001002 1280 104 1080 29 4 2 0 0 LFH
00000275b2610000 00001002 1280 532 1080 62 15 2 0 1 LFH
00000275b0be0000 00001002 1280 88 1080 15 4 2 0 1 LFH
00000275b2840000 00001002 1280 556 1080 48 16 2 0 1 LFH
00000275b2bc0000 00001002 1280 92 1080 18 5 2 0 0 LFH
UPDATE 07/12/2017:
Using output from !heap -flt s 228
We've found a heap with 0000's with the following type of entry:
0000027d680d8d60 0023 0023 [00] 0000027d680d8d70 00228 - (busy)
? FabricClient!GetFabricClientDefaultSettings+4ba320
This has led us to take a look at our BaseActor class in which we create a FabricClient instance in the constructure using Lazy<T>
but never dispose of it, so I'm currently investigating the correct treatment of the FabricClient instance within the Actor lifetime.
Upvotes: 1
Views: 1872
Reputation: 839
With the help of the Microsoft support department we found an issue with FabricClient.
Apparently there's a known issue with disposing of the FabricClient and is due to be fixed in SDK 6.2.
For now we've migrated our code to use a static variable to hold a single instance of FabricClient per service.
Upvotes: 2