malexmave
malexmave

Reputation: 1300

Determining all required DNS Queries to show a website

I need to create a list of all DNS Queries required to display a large number of sites (ideally up to 1 000 000). The list needs to assign the queries to the page that required them.

Example: Visiting google.com required a DNS query for google.com, ssl.gstatic.com, apis.google.com and other sites. My List would read something along the lines of

google.com:google.com,ssl.gstatic.com,apis.google.com,...

(exact format not relevant here)

I currently have two ideas on how to do this:

  1. Set up a DNS Server with logging, build a script that visits a given list of domains using my DNS Server as a resolver
  2. Building a script that loads the source code of the site (think python's urllib2, for example), parsing all embedded content and constructing a list of queries that would be needed

Both ideas have problems though. Visiting 1 000 000 Domains with a space of 2 seconds between visits (to make it possible to assign queries to the visited site afterwards), taking about 1 second to load (which is pretty optimistic) would take over 34 days, probably longer. But to build a parser I would need a complete list of all possible forms of embedded content that would result in a DNS Query, and I would need to query some of the target URLs as well (think iframes), and some content would be impossible to check for further queries (think flash content which connects to other servers).

I'm kind of stuck here, and would appreciate some input on how to deal with this. It would be possible to shorten the List of URLs to maybe 100 000, but any less would dramatically reduce the use of the result.

For context: I need this list for my bachelor thesis dealing with a attack strategy on a proposed DNS privacy extension.

Upvotes: 4

Views: 748

Answers (2)

JamesHannah
JamesHannah

Reputation: 125

You can use PhantomJS to do this, as it provides an interface that will let you capture network requests and log them, something along the lines of this example.

You'd need to write some simple Javascript, but as it's Node, it should be fairly easy to run this asynchronously to gather the data you need within a reasonable time.

Upvotes: 1

scottr.nist
scottr.nist

Reputation: 11

There is a tool that can do this and produce a graphic representation. It is part of dnssec-tools called DNSpktflow (DNS Packet Flow)

It may not do what you want exactly but it is open source so you can see how they do it.

Upvotes: 1

Related Questions