Reputation: 451
I'm trying to resolve a list of hostnames. The problem is when I hit a non existent domain, it slows down the whole process. The code is a trivial for loop:
for domain in domains:
try:
if socket.gethostbyname(domain.split('@')[1]):
file1.write(domain)
else:
file2.write(domain)
except socket.gaierror:
pass
I was wondering if there is a simple way to parallelize what is inside the for loop.
Upvotes: 1
Views: 336
Reputation: 66
You could use one of example from Gevent - dns_mass_resolve.py. There's also usefull possibility of setting timeout for all queries.
from __future__ import with_statement
import sys
import gevent
from gevent import socket
from gevent.pool import Pool
N = 1000
# limit ourselves to max 10 simultaneous outstanding requests
pool = Pool(10)
finished = 0
def job(url):
global finished
try:
try:
ip = socket.gethostbyname(url)
print ('%s = %s' % (url, ip))
except socket.gaierror:
ex = sys.exc_info()[1]
print ('%s failed with %s' % (url, ex))
finally:
finished += 1
with gevent.Timeout(2, False):
for x in xrange(10, 10 + N):
pool.spawn(job, '%s.com' % x)
pool.join()
print ('finished within 2 seconds: %s/%s' % (finished, N))
Upvotes: 5
Reputation: 15692
I don't know a simple solution. Using multiple threads/process would be complicated and would probably don't help that much, because your execution speed is bound to IO. Therefore I would have a look at some async lib like Twisted. There is a method resolve
in IReactorCore
: http://twistedmatrix.com/documents/12.2.0/api/twisted.internet.interfaces.IReactorCore.html
Upvotes: 1
Reputation: 27581
import thread
def resolve_one_domain(domain):
...
for domain in domains:
thread.start_new_thread(resolve_one_domain, [domain])
Upvotes: 0