DaveO
DaveO

Reputation: 1989

current logical threads increasing / thread stack is leaking

Monitoring my .NET app in Performance Monitor I can see the .NET CLR LocksAndThreads / # of current logical Threads is increasing steadily (currently 293) over time which indicates the thread stack is leaking.

I can find many articles which tell me this is the problem but nothing that tells me how to find the cause - so where do I start? Can Windbg tell me where the problem lies?

This is my performance monitor over 3hrs telling my current logical threads is 150:

thread leak

And this is the output of the threads window, which doesn't tell me much because I can't access their call stacks - they are mostly marked as [unavailable] or [In a sleep, wait or join] | [External Code]:

Unflagged       141024  124 Worker Thread   <No Name>       Normal
Unflagged   >   0   0   Unknown Thread  [Thread Destroyed]      
Unflagged       136272  2   Worker Thread   <No Name>       Highest
Unflagged       133060  7   Worker Thread   vshost.RunParkingWindow [Managed to Native Transition]  Normal
Unflagged       136952  10  Main Thread Main Thread [edited].Program.Main   Normal
Unflagged       134544  9   Worker Thread   .NET SystemEvents   [Managed to Native Transition]  Normal
Unflagged       136556  11  Worker Thread   Worker Thread   [edited].MessageService.ProcessJobs.AnonymousMethod__0  Normal
Unflagged       141364  113 Worker Thread   <No Name>   [In a sleep, wait, or join] Normal
Unflagged       140896  0   Worker Thread   [Thread Destroyed]      Normal
Unflagged       136776  19  Worker Thread   <No Name>   [In a sleep, wait, or join] Normal
Unflagged       135704  20  Worker Thread   <No Name>   [In a sleep, wait, or join] Normal
Unflagged       136712  21  Worker Thread   <No Name>   [In a sleep, wait, or join] Normal
Unflagged       134984  22  Worker Thread   <No Name>   [In a sleep, wait, or join] Normal
Unflagged       134660  23  Worker Thread   Worker Thread   [edited].BroadcastService.ProcessJobs.AnonymousMethod__1d   Normal
Unflagged       140224  152 Worker Thread   <No Name>       Normal
Unflagged       140792  157 Worker Thread   <No Name>       Normal
Unflagged       137116  0   Worker Thread   <No Name>       Normal
Unflagged       140776  111 Worker Thread   <No Name>       Normal
Unflagged       140784  0   Worker Thread   [Thread Destroyed]      Normal
Unflagged       140068  145 Worker Thread   <No Name>       Normal
Unflagged       139000  150 Worker Thread   <No Name>       Normal
Unflagged       140828  52  Worker Thread   <No Name>       Normal
Unflagged       137752  146 Worker Thread   <No Name>       Normal
Unflagged       140868  151 Worker Thread   <No Name>       Normal
Unflagged       141324  139 Worker Thread   <No Name>       Normal
Unflagged       140168  154 Worker Thread   <No Name>       Normal
Unflagged       141848  0   Worker Thread   [Thread Destroyed]      Normal
Unflagged       135544  153 Worker Thread   <No Name>       Normal
Unflagged       142260  140 Worker Thread   <No Name>       Normal
Unflagged       141528  142 Worker Thread   <No Name>   [In a sleep, wait, or join] Normal
Unflagged       141344  0   Worker Thread   [Thread Destroyed]      Normal
Unflagged       140096  136 Worker Thread   <No Name>       Normal
Unflagged       141712  134 Worker Thread   <No Name>       Normal
Unflagged       141688  147 Worker Thread   <No Name>       Normal

Update: I've since tracked the culprit down to a System.Timers.Timer. Even when this timer called an empty method on each Elapsed event it still raised the logical thread count indefinitely. Just changing the timer to a DispatcherTimer has fixed the problem.

I started looking into all the timers in my application after seeing a large number when running !dumpheap -type TimerCallback in Windbg as mentioned in this question.

I'd still like to know how I could've detected this via Windbg debugging rather than the disable timers/check performance/repeat method that lead me to the fix. I.e. anything that could've told me which timer was creating the problem.

Upvotes: 9

Views: 6078

Answers (2)

isaldarriaga
isaldarriaga

Reputation: 191

Try all your long running operations (100+ ms Database calls, disk or network access) to run asynchronously.

Use async/await primitive instructions in .NET 4.5.

Thread pool will increase in thread number if no thread is available when a queued task is retrieved from the thread pool queue. If the tendency continues this way in the server you'll probably end with a Thread pool starvation. With the thread pool queue full of tasks .net will reject more requests, so you'll be at the limit of scalability of your application.

await instruction will generate a workflow in your application, freeing the main thread. After the long running operation is done, a new task is queued in the thread pool automatically letting you application to resume. Freeing and recycling threads this way will keep the # of current logical threads at a minimum level, preventing the starvation and more context switches between threads.

Also in the .NET 4.5 a new algorithm controls the cost/benefit of new thread creation inside the thread pool, keeping a reasonable relation between performance increase and context switching when the tendency is to increase. This is an additional benefit you get if you move to 4.5 if already you haven't do so.

So the first step is to identify your long running operations and then make them async.

You can verify this by correlating # of current logical threads with other counters (database client connections, disk IO reads, etc). If the first increase when the others increase you're likely to be sure this is the problem. Also check how long the operations take. 100 ms is a good measure to say your operation is long running in a general sense.

Hope this help.

Upvotes: -1

Hans Passant
Hans Passant

Reputation: 941545

This is typically caused by having thread-pool threads getting stuck and not completing. Every half a second, the threadpool manager allows another thread to start to try to work down the backlog. This keeps going until it reaches the maximum number of threads as set by ThreadPool.SetMaxThreads(). By default a huge number, 1000 on a 4-core machine.

Use Debug + Windows + Threads to look at the running threads. Their call stack should make it obvious why they are blocking.

Upvotes: 4

Related Questions