John Kattenhorn
John Kattenhorn

Reputation: 839

Service Fabric Stateful Service consuming large amounts of unmanaged memory

I have a Stateful Service which consumes increasing amount of memory until the service is restarted or process is killed in which then releases the memory.

See the diagram below; it looks like the typical saw-tooth type problem of memory leaking somewhere.

Memory Usage Graph

We used DotMemory to run some analysis of the memory usage of a single node within the cluster and it reported the vast majority of the memory being consumed was in unmanaged memory.

DotMemory Profile Picture

Just before we cycled the stateful service we took a memory dump file to see if we could learn anything further using WinDbg.

I'm no WinDbg expert but I followed this article which seemed to suggest that most of the memory was being consumed by Heap Stack (http://hacksoflife.blogspot.co.uk/2009/06/heap-debugging-memoryresource-leak-with.html)

It suggested that I should use some additional commands to get a stack trace but I didn't do this prior to taking the dmp file (gflags.exe /i yourApplication.exe +ust).

Is there anyone who could help me diagnose the issue using the dmp file I have ?

Could someone validate that the steps mentioned in the article would be worth following to try to find this issue ?#

Has anyone experience this kind of issue with Stateful Services before ?

ADDITIONAL INFO:

Here is an image of the inspections report from DotMemory, on the object leak inspection I need to re-check the code I don't remember us instantiating those objects in our code.

Here is the output I got from running !address -summary

0:000> !address -summary


Mapping file section regions...
Mapping module regions...
Mapping PEB regions...
Mapping TEB and stack regions...
Mapping heap regions...
Mapping page heap regions...
Mapping other regions...
Mapping stack trace database regions...
Mapping activation context regions...

--- Usage Summary ---------------- RgnCount ----------- Total Size -------- %ofBusy %ofTotal
Free                                    865     7ff7`4c03d000 ( 127.966 TB)           99.97%
Heap                                  85669        6`e0f8e000 (  27.515 GB)  79.04%    0.02%
<unknown>                             21045        1`962b3000 (   6.346 GB)  18.23%    0.00%
Stack                                   559        0`2f8c0000 ( 760.750 MB)   2.13%    0.00%
Image                                  1156        0`0d170000 ( 209.438 MB)   0.59%    0.00%
Other                                     9        0`001c7000 (   1.777 MB)   0.00%    0.00%
TEB                                     189        0`0017a000 (   1.477 MB)   0.00%    0.00%
PEB                                       1        0`00001000 (   4.000 kB)   0.00%    0.00%

--- Type Summary (for busy) ------ RgnCount ----------- Total Size -------- %ofBusy %ofTotal
MEM_PRIVATE                          106665        8`a403b000 (  34.563 GB)  99.28%    0.03%
MEM_IMAGE                              1906        0`0ecd1000 ( 236.816 MB)   0.66%    0.00%
MEM_MAPPED                               57        0`012a7000 (  18.652 MB)   0.05%    0.00%

--- State Summary ---------------- RgnCount ----------- Total Size -------- %ofBusy %ofTotal
MEM_FREE                                865     7ff7`4c03d000 ( 127.966 TB)           99.97%
MEM_COMMIT                            96056        7`718c6000 (  29.774 GB)  85.53%    0.02%
MEM_RESERVE                           12572        1`426ed000 (   5.038 GB)  14.47%    0.00%

--- Protect Summary (for commit) - RgnCount ----------- Total Size -------- %ofBusy %ofTotal
PAGE_READWRITE                        53162        7`56794000 (  29.351 GB)  84.31%    0.02%
PAGE_NOACCESS                         41097        0`0a089000 ( 160.535 MB)   0.45%    0.00%
PAGE_EXECUTE_READ                       120        0`08948000 ( 137.281 MB)   0.39%    0.00%
PAGE_READONLY                           760        0`05996000 (  89.586 MB)   0.25%    0.00%
PAGE_EXECUTE_READWRITE                  423        0`01a3f000 (  26.246 MB)   0.07%    0.00%
PAGE_WRITECOPY                          259        0`00f4c000 (  15.297 MB)   0.04%    0.00%
PAGE_READWRITE|PAGE_GUARD               185        0`0033b000 (   3.230 MB)   0.01%    0.00%
PAGE_EXECUTE_WRITECOPY                   50        0`00105000 (   1.020 MB)   0.00%    0.00%

--- Largest Region by Usage ----------- Base Address -------- Region Size ----------
Free                                    27e`d5230000     7d77`29c10000 ( 125.465 TB)
Heap                                    276`50c95000        0`00d3a000 (  13.227 MB)
<unknown>                               275`81a2d000        0`1e5d3000 ( 485.824 MB)
Stack                                   14c`1f200000        0`00800000 (   8.000 MB)
Image                                  7ffc`5d014000        0`01083000 (  16.512 MB)
Other                                   275`bf920000        0`00181000 (   1.504 MB)
TEB                                      f9`09a04000        0`00002000 (   8.000 kB)
PEB                                      f9`09bf0000        0`00001000 (   4.000 kB)

Here is the output I get from !heap -s

0:000> !heap -s


************************************************************************************************************************
                                              NT HEAP STATS BELOW
************************************************************************************************************************
LFH Key                   : 0x8b79585e7994c063
Termination on corruption : ENABLED
          Heap     Flags   Reserv  Commit  Virt   Free  List   UCR  Virt  Lock  Fast 
                            (k)     (k)    (k)     (k) length      blocks cont. heap 
-------------------------------------------------------------------------------------
00000275bf510000 00000002 28701700 28609240 28701500 189846 42173  1804    5 2456d24   LFH
    Lock contention  38104356 
00000275bf3b0000 00008000      64      4     64      2     1     1    0      0      
00000275bf780000 00001002    1280    368   1080    100     8     2    0      0   LFH
00000275bf710000 00001002    1280    388   1080    109     7     2    0      0   LFH
00000275bfc80000 00001002    1280    264   1080      7     9     2    0      0   LFH
00000275bfe70000 00041002      60      8     60      5     1     1    0      0      
00000275d8730000 00041002     260     68     60     14     2     1    0      0   LFH
00000275d89a0000 00001002   31792  15028  31592   3404   244    14    3    106   LFH
    External fragmentation  22 % (244 free blocks)
00000275d8950000 00001002   80356  19512  80156  17801    91    36    0     22   LFH
    External fragmentation  91 % (91 free blocks)
00000275d8930000 00001002    1280    104   1080     29     4     2    0      0   LFH
00000275b2610000 00001002    1280    532   1080     62    15     2    0      1   LFH
00000275b0be0000 00001002    1280     88   1080     15     4     2    0      1   LFH
00000275b2840000 00001002    1280    556   1080     48    16     2    0      1   LFH
00000275b2bc0000 00001002    1280     92   1080     18     5     2    0      0   LFH

enter image description here

UPDATE 07/12/2017:

Using output from !heap -flt s 228

We've found a heap with 0000's with the following type of entry:

0000027d680d8d60 0023 0023 [00] 0000027d680d8d70 00228 - (busy) ? FabricClient!GetFabricClientDefaultSettings+4ba320

This has led us to take a look at our BaseActor class in which we create a FabricClient instance in the constructure using Lazy<T> but never dispose of it, so I'm currently investigating the correct treatment of the FabricClient instance within the Actor lifetime.

Upvotes: 1

Views: 1872

Answers (1)

John Kattenhorn
John Kattenhorn

Reputation: 839

With the help of the Microsoft support department we found an issue with FabricClient.

Apparently there's a known issue with disposing of the FabricClient and is due to be fixed in SDK 6.2.

For now we've migrated our code to use a static variable to hold a single instance of FabricClient per service.

Upvotes: 2

Related Questions