dalt
dalt

Reputation: 33

Is that possible to parallel resolve list from two+ dns servers?

I am total new with python, and to be honest, programming at all. I made my first script for resolving list of domains with Google help and some luck I guess.

List of domains contains about 100 000 domains, and I have to optimize time for complete this task, because it will repeating task, and now it need about two hour to do it. I can split list and run each script separately, but if possible to set up 2 or more DNS servers and parallel resolving from them it will great. Or maybe there is more methods to optimize running time?

I had read docs for dnspython, but its too complex for my python skill level (which is ~0).

import socket
import dns.resolver

w = open ('/home/dalt/pyth/resolved.txt', "w")
x = open ('/home/dalt/pyth/not_resolved.txt', "w")
with open('/home/dalt/pyth/domains2.txt') as f:
    my_list = [line.strip() for line in f.readlines()]

resolver = dns.resolver.Resolver()
resolver.nameservers=[socket.gethostbyname('212.xxx.xxx.134')]

for domain in my_list:
    try:
        q = resolver.query(domain, 'A')
        for ipval in q:
            print(ipval, file=w)
    except dns.resolver.NXDOMAIN:
            print(domain, 'NXDOMAIN', file=x)
    except dns.resolver.NoNameservers:
        print(domain, 'NoNameservers',file=x)
    except dns.resolver.NoAnswer:
        print(domain, 'NoAnswer',file=x)
    except dns.name.BadEscape:
        print(domain, 'BadEscape',file=x)

f.close()

Upvotes: 3

Views: 1269

Answers (1)

Czaporka
Czaporka

Reputation: 2407

I'm not very experienced with networking but I would guess most of the execution time of your script comes from communication with the DNS server, which means that your CPU is mostly just waiting for data, which means that you should be able to optimize the task by the use of multiple threads.

It is the easiest to use a ThreadPool:

from multiprocessing.pool import ThreadPool
import socket

import dns.resolver

my_list = [
    "www.google.com",
    "www.facebook.com",
    "doesnt.exist",
]

resolver = dns.resolver.Resolver()
resolver.nameservers=[
    socket.gethostbyname("8.8.4.4"),
    socket.gethostbyname("8.8.8.8"),
]

w = open("resolved.txt", "w")
x = open("not_resolved.txt", "w")

def resolve(domain):
    try:
        q = resolver.query(domain, "A")
        for ipval in q:
            print(domain, ipval, file=w)
    except dns.resolver.NXDOMAIN:
        print(domain, "NXDOMAIN", file=x)
    except dns.resolver.NoNameservers:
        print(domain, "NoNameservers", file=x)
    except dns.resolver.NoAnswer:
        print(domain, "NoAnswer", file=x)
    except dns.name.BadEscape:
        print(domain, "BadEscape", file=x)

pool = ThreadPool(processes=10)  # increasing this number may speed things up
results = pool.map(resolve, my_list)

w.close()
x.close()

Results:

$ cat not_resolved.txt
doesnt.exist NXDOMAIN
$ cat resolved.txt
www.google.com 172.217.20.196
www.facebook.com 31.13.81.36

The above code doesn't attempt to distribute the list of domains among the available DNS servers, unless the dnspython package does it under the hood. But I would expect that even a single DNS server will respond really quickly to concurrent queries, because it probably uses multiple threads itself.

Upvotes: 2

Related Questions