Reputation: 4344
I have my own Azure WebApp and that has it's own Azure SQL Server instance. I've noticed that when I do my WebApp development on my local machine, that sometimes the connection to the SQL Server instance fails. I get this error 'The specified network name is no longer available'. If I continue through the Exception (or re-run the WebApp) and then refresh the webpage, it will connect just fine the next time. There isn't minutes between trying again either. It can take as little as a few seconds before it works again.
I have no clue why this happens. Sometimes it will work for an hour and then just stop working. I've done research online and the only stuff I've found is to have some sort of 'retry policy' in place. I'm not sure I need this since I have yet to see this happen in the production environment (knock on wood).
Could this be some sort of DNS issue? Anyone have suggestions on how to get past this? I can work through it but it is very annoying.
Upvotes: 2
Views: 1051
Reputation: 39807
The answer to this is going to be pretty broad, but welcome to the world of cloud computing where Transient Faults are bound to happen. You are right in what you need to do, but wrong in why you shouldn't do it. Just because you haven't seen them in production does not mean that they are not happening and/or going to happen. Basically, if this app needs to have any sort of stability requirements or more than a handful or people using it, then you really should 'Defensively Code' and bake this in up front.
What are Transient Faults?
Basically, they are faults that occur, may, from time to time due to things like packet loss, timeouts (either due to previously running queries or throttling), etc. A quote from the article linked below:
These are mainly network issues. If you perform an operation in Azure and sometimes it does not work however if you try after some time, it works well. This odd and non-predictable behavior is nothing but transient faults. When we talk about SQL Azure, transient faults are common. The article Azure SQL Database Resource Management - http://msdn.microsoft.com/library/azure/dn338083.aspx mentions the scenarios in which you may face transient fault of SQL Azure. I highly recommend you to go through the all this documents and related links. Very informative!! Now, very common transient error of SQL Azure is throttling.
Why do I have to worry about them now in the cloud?
Because most developers are used to developing with a SQL server on the same machine or at least on the same network. Response times and opportunities for packet loss are minimal due to the distance. Azure data centers are not in the same building as you, so the chance for Transient Errors increases.
How do I fix it?
Lots of ways, but I really recommend you read some articles around the implementation of the Transient Fault Handling Block and how to implement with SQL Azure. Here is one discussing how to enable with Entity Framework. This framework allows you to specify how many times to retry, how often to retry and what method you use to determine how often to retry (linear as in ever 10ms, Exponential-Backoff as in 10ms, then 100ms, then 1000ms).
I wish I could just give you a code sample, but your problem is more than just a simple one line code fix. Also, anytime you use any of the cloud based providers (Sql Azure, Blob Storage, Service Bus) - you should be aware and implement the Transient Fault Handling Block.
Upvotes: 6