Reputation: 199
In my current project we (I mean "project team") use WCF services hosted on IIS.
Here are some technical details which may be important:
So, the problem is - sometimes WCF-services become unavailable. When we try to reach these WCF-services we get timeout error. And the only way to restore WCF-service functioning is to restart NetTcpActivator (Net.Tcp Listener Adapter) Windows service.
According to my colleague's theory, this error may be related to the problems described in this KB article:
FIX: Smsvchost.exe for the WCF service stops responding when you run a .NET Framework 4-based WCF service http://support.microsoft.com/kb/2536618
According to this article, SMSvcHost (container service which hosts NetTcpActivator and Port Sharing Service) hangs up if it can't route a request to w3wp (IIS worker process) in over 60 seconds (non-configurable timeout). Unfortunately, we are unable to find the way to reproduce this error. For example, we limited SMSvcHost to 1 CPU core and 1 thread and extended pending connections limit to 1M and pushing it to 100% CPU load in user mode. And it didn't hang!
Sometimes our load tests lead to strange errors, but when we stop them, all services automatically recover to their normal state. But sometimes not a heavy load may hang NetTcpActivator!
In addition, I would like to say that this is not a new problem. My colleagues already got it 3 years ago (see this thread for additional information http://forums.iis.net/t/1167668.aspx/1/10). And, unfortunately, they didn't get the answer. The problem just disappeared after some configuration changes! And now it came back on the new server.
I will really appreciate all you thoughts and ideas!
Upvotes: 8
Views: 37168
Reputation: 9766
Alright, after lots of research I tracked down the cause of our issue. There may be other scenarios where this occurs, but hopefully this will help some people. Microsoft is in the process of reproducing in their labs and should have a fix eventually.
In our case, all the planets had to align. We had one .NET 4 integrated app pool for client and server (on developer machine). The service was using an external config file for bindings (<bindings configSource="serviceModel.bindings.config" />
) which was linked from another project and copied at build time with a custom build task added to the service's .csproj.
To reproduce the issue:
I don't know yet if w3wp or SMSvcHost is the culprit. Step #3 is critical, though I can't explain why yet. If you don't delete the file, then all is fine. If you modify the file (created date stays the same), all is fine. If you move the config XML into the main Web.config file, all is fine. When the build task copies the file the created date is updated, so I am guessing it's cached some way and one of the processes detects the date change.
If you restart the SMSvcHost services (full stop, full start) once or twice the client request will go through and from then on you're fine.
So my guess for now is that this could be an issue right after a deployment, but if you make sure everything is running (and restart services as needed) then you should be fine. You can also not do the external/linked files.
Once Microsoft tracks down the issue I will hopefully have more insight.
Final Update I forgot to come back to this earlier. Microsoft essentially admitted they probably had a bug but since there was a workaround and had spent enough time on the ticket they were closing it and not researching further. There appears to be some type of race condition when SMSvcHost starts up with the following setup (similar to what I posted earlier):
configSource
Linking the external config had nothing to do with it. The workaround was to not use configSource
which we are doing now.
Upvotes: 0