Reputation: 78850
Is there any way to detect when attempted outbound connections are queuing?
Our ASP.NET application makes a lot of outbound requests to other web services. Recently we ran across major performance issues, where calls to a particular endpoint were taking many seconds to complete or timing out. The owners of that service did not see any performance issues on their end. When we analyzed the network traffic, we saw that indeed, the HTTP requests were completing in a timely manner. That's when we figured out that our long wait times and timeouts were due to connection queuing.
Our first approach for fixing this was to simply increase the number of allowed outbound connections to that endpoint, thusly:
<system.net>
<connectionManagement>
<add address="http://some.endpoint.com" maxconnection="96" />
</connectionManagement>
</system.net>
This did drop our calls to the endpoint drastically. However, we noticed that this caused our overall inbound requests to take much longer to complete. That's when we came across Microsoft KB 821268 . Following the "rule of thumb" guidelines there, we came up with these additional changes:
<processModel maxWorkerThreads="100" maxIoThreads="100" minWorkerThreads="50"/>
<httpRuntime minFreeThreads="704" minLocalRequestFreeThreads="608"/>
This appeared to fix everything. Our calls to some.endpoint.com
were still fast, and our response times dropped as well.
A few days later, however, it was brought to our attention that our site was performing poorly, and we saw some SQL Server timeouts. Our DBA did not see anything amiss in the performance of the server, so this looked like something similar happening all over again; we're wondering if the increased connections to some.endpoint.com
is causing other outbound calls to queue, maybe due to insufficient threads.
The worst part about this, is we haven't found a good technique to definitively know whether outbound connection queuing is taking place. All we've been able to do is observe the time between when we make the request and receive a response in our application. It's hard to know whether timeouts and long response times are due to queuing specifically.
Are there any effective tools for measuring and tuning outbound request throttling? Any other performance tuning tips would definitely be appreciated as well.
Upvotes: 5
Views: 4166
Reputation: 3864
The problem you are describinng touches many areas of diagnostics and I suppose there is no one simple tool that will allow you to say whether you suffer contention or not. From your description it looks like your depleting either connection or thread pools. This usually involves thread locking. Apart from the HttpWebRequest Average Queue Time
performance counter pointed by @Simon Mourier (remember to set performancecounters="enabled"
in your config file) there are few more to monitor. I would start with custom performance counters that will monitor thread pool usage in your ASP.NET application - unfortunately they are not included into framework counters but they are fairly simple to implement as shown here. Additionally I wrote a simple powershell script that will group for you thread states in your application. You may get it from here. It resembles a bit top command in Linux and will show you thread states or thread wait reasons for your processes. Have a look at 2 applications (both named Program.exe) screenshots:
one suffering from contention
> .\ThreadsTop.ps1 -ThreadStates -ProcMask Program
Threads states / process
Process Name Initialized Ready Running Standby Terminated Waiting Transition Unknown
------------ ----------- ----- ------- ------- ---------- ------- ---------- -------
Program 0 0 0 0 0 22 0 0
and the number of waiting threads constantly growing
> .\ThreadsTop.ps1 -ThreadWaitReasons -ProcMask Program
Legend:
0 - Waiting for a component of the Windows NT Executive| 1 - Waiting for a page to be freed
2 - Waiting for a page to be mapped or copied | 3 - Waiting for space to be allocated in the paged or nonpag
ed pool
4 - Waiting for an Execution Delay to be resolved | 5 - Suspended
6 - Waiting for a user request | 7 - Waiting for a component of the Windows NT Executive
8 - Waiting for a page to be freed | 9 - Waiting for a page to be mapped or copied
10 - Waiting for space to be allocated in the paged or nonpaged pool| 11 - Waiting for an Execution Delay to be resolve
d
12 - Suspended | 13 - Waiting for a user request
14 - Waiting for an event pair high | 15 - Waiting for an event pair low
16 - Waiting for an LPC Receive notice | 17 - Waiting for an LPC Reply notice
18 - Waiting for virtual memory to be allocated | 19 - Waiting for a page to be written to disk
Process Name 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
------------ - - - - - - - - - - -- -- -- -- -- -- -- -- -- --
Program 1 0 0 0 0 0 34 0 0 0 0 0 0 0 0 3 0 0 0 0
and other running normally:
> .\ThreadsTop.ps1 -ThreadStates -ProcMask Program
Threads states / process
Process Name Initialized Ready Running Standby Terminated Waiting Transition Unknown
------------ ----------- ----- ------- ------- ---------- ------- ---------- -------
Program 0 1 6 0 0 20 0 0
the number of waiting threads does not gets higher than 24.
> .\ThreadsTop.ps1 -ThreadWaitReasons -ProcMask Program
Legend:
0 - Waiting for a component of the Windows NT Executive| 1 - Waiting for a page to be freed
2 - Waiting for a page to be mapped or copied | 3 - Waiting for space to be allocated in the paged or nonpag
ed pool
4 - Waiting for an Execution Delay to be resolved | 5 - Suspended
6 - Waiting for a user request | 7 - Waiting for a component of the Windows NT Executive
8 - Waiting for a page to be freed | 9 - Waiting for a page to be mapped or copied
10 - Waiting for space to be allocated in the paged or nonpaged pool| 11 - Waiting for an Execution Delay to be resolve
d
12 - Suspended | 13 - Waiting for a user request
14 - Waiting for an event pair high | 15 - Waiting for an event pair low
16 - Waiting for an LPC Receive notice | 17 - Waiting for an LPC Reply notice
18 - Waiting for virtual memory to be allocated | 19 - Waiting for a page to be written to disk
Process Name 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
------------ - - - - - - - - - - -- -- -- -- -- -- -- -- -- --
Program 1 0 0 0 0 0 18 0 0 0 0 0 0 0 0 6 0 0 0 0
Of course the number of threads will be much higher in your case but you should be able to observe some tendency in threads behavior in "calm times" and waiting queue peaks when you suffer from contention.
You may freely modify my script so it will dump this data somewhere else than console (like database). Finally I would recommend running profiler such as Concurrency Visualizer that will give you some more insight into threads behavior in your application. Enabling system.net trace sources might also help although the number of events might be overwhelming so try to tune it accordingly.
Upvotes: 6
Reputation: 3025
What you're describing here is a very complex problem. Your app is basically in the middle of several other things, SQL Servers, web service providers, etc and you're trying to figure out what's slow. Is it your app or someone else's that you're relying on.
I've gone down the road of trying to set up performance monitors and digging through logs, etc myself but found it very time consuming and difficult to visualize what is actually happening over a period of time. It's easy to collect data and look at a point in time. It's hard to look at all of the data and make it meaningful over a period of time, especially if there are a lot of connected systems involved.
If I were you I'd try the free NewRelic trial: http://newrelic.com/application-monitoring I've used their product before and found it to be invaluable on problems like this.
Upvotes: 0
Reputation: 2397
Nagling is a TCP optimization on the sender and it is designed to reduce network congestion by coalescing small send requests into larger TCP segments.This is achieved by holding back small segments either until TCP has enough data to transmit a full sized segment or until all outstanding data has been acknowledged by the receiver
However Nagling interacts poorly with TCP Delayed ACKs, which is a TCP optimization on the receiver. It is designed to reduce the number of acknowledgement packets by delaying the ACK for a short time. RFC 1122 states that the delay should not be more than 500ms and there should be an ACK for every second segment. Since the receiver delays the ACK and the sender waits for the ACK before transmitting a small segment, the data transfer can get stalled until the delayed ACK arrives.
Source here
It seems like your server is very "chatty", making lots of requests and response all the time, try this:
ServicePointManager.UseNagleAlgorithm = false;
Upvotes: 0