Reputation: 9387

Unable to resolve DNS (sometimes?)

Given an application that in parallel requests 100 urls at a time for 10000 urls, I'll receive the following error for 50-5000 of them:

The remote name cannot be resolved 'www.url.com'

I understand that the error means the DNS Server was unable to resolve the url. However, for each run, the number of urls that cannot be resolved changes (ranging from 50 to 5000).

Am I making too many requests too fast? And can I even do that? - Running the same test on a much more powerful server, shows that only 10 urls could not be resolved - which sounds much more realistic.

The code that does the parallel requesting:

var semp = new SemaphoreSlim(100);
var uris = File.ReadAllLines(@"C:\urls.txt").Select(x => new Uri(x));

foreach(var uri in uris)
{
   Task.Run(async () =>
   {
      await semp.WaitAsync();
      var result = await Web.TryGetPage(uri); // Using HttpWebRequest
      semp.Release();
   });   
}

Upvotes: 3

Answers (2)

spender

Reputation: 120528

I'll bet that you didn't know that the DNS lookup of HttpWebRequest (which is the cornerstone of all .net http apis) happens synchronously, even when making async requests (annoying, right?). This means that firing off many requests at once causes severe ThreadPool strain and large amount of latency. This can lead to unexpected timeouts. If you really want to step things up, don't use the .net dns implementation. You can use a third party library to resolve hosts and create your webrequest with an ip instead of a hostname, then manually set the host header before firing off the request. You can achieve much higher throughput this way.

Upvotes: 5

O. Jones

Reputation: 108841

It does sound like you're swamping your local DNS server (in the jargon, your local recursive DNS resolver).

When your program issues a DNS resolution request, it sends a port 53 datagram to the local resolver. That resolver responds either by replying from its cache or recursively resending the request to some other resolver that's been identified as possibly having the record you're looking for.

So, your multithreaded program is causing a lot of datagrams to fly around. Internet Protocol hosts and routers handle congestion and overload by dropping datagram packets. It's like handling a traffic jam on a bridge by bulldozing cars off the bridge. In an overload situation, some packets just disappear.

So, it's up to endpoint software using datagram protocols to try again if their packets get lost. That's the purpose of TCP, and that's how it can provide the illusion of an error-free stream of data even though it can only communicate with datagrams.

So, your program will need to try again when you get resolution failure on some of your DNS requests. You're a datagram endpoint so you own the responsibility of retry. I suspect the .net library is give you back failure when some of your requests time out because your datagrams got dropped.

Now, here's the important thing. It is also the responsibility of a datagram endpoint program, like yours, to implement congestion control. TCP does this automatically using its sliding window system, with an algorithm called slow-start / exponential backoff. If TCP didn't do this all internet routers would be congested all the time. This algorithm was dreamed up by Van Jacobson, and you should go read about it.

In the meantime you should implement a simple form of it in your bulk DNS lookup program. Here's how you might do that.

Start with a batch size of, say, 5 lookups.
Every time you get the whole batch back successfully, increase your batch size by one for your next batch. This is slow-start. As long as you're not getting congestion, you increase the network load.
Every time you get a failure to resolve a name, reduce the size of the next batch by half. So, for example, if your batch size was 30 and you got a failure, your next batch size will be 15. This is exponential backoff. You respond to congestion by dramatically reducing the load you're putting on the network.
Implement a maximum batch size of something like 100 just to avoid being too much of a pig and looking like a crude denial-of-service attack to the DNS system.

I had a similar project a while ago and this strategy worked well for me.

Upvotes: 3

Unable to resolve DNS (sometimes?)

Answers (2)

Related Questions