TCC
TCC

Reputation: 2596

ServiceModel error logged on server when closing client application: existing connection was forcibly closed by the remote host

I have a self-hosted WCF service, and several client processes ... everything works well, clients start, make several service calls, and exit. However on the server my error logs (where I forward error level trace messages from System.ServiceModel) have an entry every time a client application closes (this does not coincide with a service method call).

I'm using a custom tcp binding on .NET 4.5 ...

 <bindings>
  <customBinding>
    <binding name="tcp">
      <security authenticationMode="SecureConversation" />
      <binaryMessageEncoding compressionFormat="GZip" />
      <tcpTransport />
    </binding>
  </customBinding>

The client derives from ClientBase, and I do call Close() on the client without issue. Several instances of the ClientBase are created and Closed during operation with no errors.

My guess is that the client is keeping a socket open for re-use (a sensible optimization). Then at application exit, that socket is getting zapped.

Does this represent an error that I should fix? If its not really an "error" can I none-the-less avoid the situation somehow so as to not put junk-to-ignore in my error logs?

The client binding configuration is exactly the same as the server (naturally). Here is my calling code... note I use the ServiceHelper class from this question.

using (var helper = new ServiceHelper<ARAutomatchServiceClient, ServiceContracts.IARAutomatchService>())
{
    return await helper.Proxy.GetBatchesAsync(startDate, DateTime.Today.AddDays(5));
}

Specifically, the "Error" level trace events on the server that I am seeing contains these messages (stack traces and other elements cleaned for brevity):

System.ServiceModel Error: 131075 :

System.ServiceModel.CommunicationException: The socket connection was aborted. This could be caused by an error processing your message or a receive timeout being exceeded by the remote host, or an underlying network resource issue.

System.Net.Sockets.SocketException: An existing connection was forcibly closed by the remote host NativeErrorCode: 2746

Upvotes: 4

Views: 1809

Answers (1)

TCC
TCC

Reputation: 2596

The source of all of the unwanted error messages that I have seen in the ServiceModel trace logs come from connections in the connection pool timing out on the server, or the client dropping a connection when the client process exits.

If a pooled connection times out on the server, there are some trace messages written on the server immediately on timing out and then on the client when starting the next operation. These are "Error" level trace messages.

If the client process exits before closing the connection, you get a different Error level trace message on the server immediately when the client process exits.

The fact that these are Error level trace messages is particularly annoying because I typically log these even in production environments ... but it appears these should mostly just be ignored, since its the result of a routine connection pool connection timing out.

One description of a pooled connection closing issue has been addressed by Microsoft here.

http://support.microsoft.com/kb/2607014

The article above advises that ServiceModel handles the Exception and it is safe to ignore when you see it in the TraceLogs. That particular situation is recorded as a "Information" level event, which again does not bother me as much as the "Error" level events that I'm actually logging. I tried to "filter" these messages from the logs, but it was rather difficult.

Naturally you can avoid the situation altogether by explicitly closing the pooled connections (on the client) before they timeout (on the server). In order for a client to close a connection in the connection pool (for a WCF binding with tcp transport) the only thing I know that works is to explicitly Close the ChannelFactory instance. In fact if you are not caching these instances (and not using ClientBase which usually caches them for you) then you will have no problems! If you DO want to cache your ChannelFactory instances, then you should at least explicitly close them before the application exits, which is not advice that I've seen ANYWHERE. Closing these before the client application exits will take care of one of the major sources of dropped sockets that get logged as ServiceModel "Error" traces on the server.

Here is a little code for closing a channel factory:

try
{
    if (channelFactory != null)
    {
        if (channelFactory.State != CommunicationState.Faulted)
        {
            channelFactory.Close();
        }
        else
        {
            channelFactory.Abort();
        }
    }
}
catch (CommunicationException)
{
    channelFactory.Abort();
}
catch (TimeoutException)
{
    channelFactory.Abort();
}
catch (Exception)
{
    channelFactory.Abort();
    throw;
}
finally
{
    channelFactory= null;
}

Just where you call that code is a bit tricky. Internally I schedule it in AppDomain.ProcessExit to "make sure" it gets called, but then also suggest consumer of my service base classes remember call the "close cached factories" code explicitly sometime earlier then AppDomain.ProcessExit, since ProcessExit handlers are limited to ~3 seconds to complete. Of course processes can close abruptly and never call this, but that's OK so long as it doesn't happen enough to flood your server logs.

As far as the pooled connections timing out ... you can just raise the TCP Transport "ConnectionPool" timeout value on the server to something very high (several days) and probably be OK in some situations. This would at least make it unlikely or infrequent that a connection would time out on the server. Note that having a shorter timeout value on the client doesn't appear to affect the situation in any way, so that setting might as well be left as the default. (reasoning: The client's connection be observed as timed out the next time the client needs a connection, but by this time the server will have either timed out already and logged the error, or if not, then the client will close and create a new connection and restart the server timeout period. however simply using the connection would also restart the server timeout period.)

So again, you must have a high enough connection pool timeout on the server, regardless of the client settings, to cover the period of inactivity on your client. You can further reduce the likelihood of a pooled connection timing out by reducing the size of the pool on the client (maxOutboundConnectionsPerEndpoint) so that the client doesn't open more connections than are really needed, leaving them to go unused and then eventually time-out on the server.

Configuring the connection pool for a binding has to be done in code for the built-in bindings (like netTcpBinding). For custom bindings you can do it in configuration like this (here I set a server to timeout in 2 days, and only pool 100 connections):

  <customBinding>
    <binding name="tcp">
      <security authenticationMode="SecureConversation"/>
      <binaryMessageEncoding compressionFormat="GZip"/>
      <tcpTransport>
        <connectionPoolSettings idleTimeout="2.00:00:00"
                                maxOutboundConnectionsPerEndpoint="100" />
      </tcpTransport>
    </binding>
  </customBinding>

These two approaches together (raising the server-side timeout and closing ChannelFactory instances when clients exit) might solve your problem, or at least reduce the number of "Safe to ignore" messages significantly. Be sure the server timeout for the connection pool is AT LEAST what the client is to make sure that the connection will timeout on the client first in the case that it ever does timeout on the server (this appears to be handled more gracefully in ServiceModel, with fewer trace messages, and is exactly the situation referred to in the knowledge base article linked above).

In the Server, you'll ideally want enough maxOutboudnConnectionsPerEndpoint to serve (number of clients) x (their number of pooled connections). Otherwise you might end up with pool overruns on the server, which emit Warning level trace events. That's not too bad. If there are no connections available on the server's pool when a new client tries to connect, this generates a bunch of events on the client and server. In all of these cases (even if the pool on the server is constantly overrun) WCF will recover and function, just not optimally. That is in my experience at least ... its possible that if the "LeaseTime" for a new connection times out waiting for a server connection pool spot to open up (default is 5 minutes) then it will just fail altogether? Not sure...

A final suggestion might be to periodically close your ChannelFactory objects and recycle the cached copy. This may have only a limited impact on performance, assuming the client doesn't try to use the service exactly while the ChannelFactory instance is recycling. For instance you might schedule recycles of cached ChannelFactory instances for 5 minutes after it is created (not after it was last used, since it might have multiple pooled connections, one of which has not been used for a while). Then set your connection pool timeout on the server to be 10 minutes or so. But be sure the server timeout is a good bit over the ChannelFactory recycle period, because when you go to recycle the ChannelFactory you might have to wait until a pending operation is completed (meanwhile some unused pooled connection possibly just timed out on the server).

All of these things are micro-optimizations that may not be worth doing ... but if you log Error level ServiceModel trace events in production, you'll probably want to do SOMETHING (even if it is disabling connection pooling or ChannelFactory caching) or your logs will likely be swamped with "safe to ignore" errors.

Upvotes: 3

Related Questions