Reputation: 3183
I just played around a little bit with python and threads, and realized even in a multithreaded script, DNS requests are blocking. Consider the following script:
from threading import Thread import socket
class Connection(Thread):
def __init__(self, name, url):
Thread.__init__(self)
self._url = url
self._name = name
def run(self):
print "Connecting...", self._name
try:
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.setblocking(0)
s.connect((self._url, 80))
except socket.gaierror:
pass #not interested in it
print "finished", self._name
if __name__ == '__main__':
conns = []
# all invalid addresses to see how they fail / check times
conns.append(Connection("conn1", "www.2eg11erdhrtj.com"))
conns.append(Connection("conn2", "www.e2ger2dh2rtj.com"))
conns.append(Connection("conn3", "www.eg2de3rh1rtj.com"))
conns.append(Connection("conn4", "www.ege2rh4rd1tj.com"))
conns.append(Connection("conn5", "www.ege52drhrtj1.com"))
for conn in conns:
conn.start()
I dont know exactly how long the timeout is, but when running this the following happens:
So my only guess is that this has to do with the GIL? Obviously the threads do not perform their task concurrently, only one connection is attempted at a time.
Does anyone know a way around this?
(asyncore doesnt help, and I'd prefer not to use twisted for now) Isn't it possible to get this simple little thing done with python?
Greetings, Tom
I am on MacOSX, I just let my friend run this on linux, and he actually does get the results I wished to get. His socket.connects()'s return immediately, even in a non Threaded environment. And even when he sets the sockets to blocking, and timeout to 10 seconds, all his Threads finish at the same time.
Can anyone explain this?
Upvotes: 8
Views: 5229
Reputation: 26699
if it's suitable you could use the multiprocessing
module to enable process-based parallelism
import multiprocessing, socket
NUM_PROCESSES = 5
def get_url(url):
try:
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.setblocking(0)
s.connect((url, 80))
except socket.gaierror:
pass #not interested in it
return 'finished ' + url
def main(url_list):
pool = multiprocessing.Pool( NUM_PROCESSES )
for output in pool.imap_unordered(get_url, url_list):
print output
if __name__=="__main__":
main("""
www.2eg11erdhrtj.com
www.e2ger2dh2rtj.com
www.eg2de3rh1rtj.com
www.ege2rh4rd1tj.com
www.ege52drhrtj1.com
""".split())
Upvotes: 2
Reputation: 414149
Send DNS requests asynchronously using Twisted Names:
import sys
from twisted.internet import reactor
from twisted.internet import defer
from twisted.names import client
from twisted.python import log
def process_names(names):
log.startLogging(sys.stderr, setStdout=False)
def print_results(results):
for name, (success, result) in zip(names, results):
if success:
print "%s -> %s" % (name, result)
else:
print >>sys.stderr, "error: %s failed. Reason: %s" % (
name, result)
d = defer.DeferredList(map(client.getHostByName, names), consumeErrors=True)
d.addCallback(print_results)
d.addErrback(defer.logError)
d.addBoth(lambda _: reactor.stop())
reactor.callWhenRunning(process_names, """
google.com
www.2eg11erdhrtj.com
www.e2ger2dh2rtj.com
www.eg2de3rh1rtj.com
www.ege2rh4rd1tj.com
www.ege52drhrtj1.com
""".split())
reactor.run()
Upvotes: 2
Reputation: 127447
On some systems, getaddrinfo is not thread-safe. Python believes that some such systems are FreeBSD, OpenBSD, NetBSD, OSX, and VMS. On those systems, Python maintains a lock specifically for the netdb (i.e. getaddrinfo and friends).
So if you can't switch operating systems, you'll have to use a different (thread-safe) resolver library, such as twisted's.
Upvotes: 15