Reputation: 1284
I'm crawling some data from the web, and since the data that I should get is huge, I have got more than 500 simultaneous requests (made through urllib.request.urlopen(url)
by pooling via multiprocessing
).
The problem here is that the following error is thrown:
urllib.error.URLError: urlopen error Temporary failure in name resolution
After some research, I have found that this problem was caused by the fact that the connections cannot be closed when there is too much requests. But haven't yet found any way to solve this.
Should I limit the simultaneous connections at some safe range, or change urllib
request configuration?
Development environment:
Upvotes: 3
Views: 2893
Reputation: 78
Try using Session Objects from the requests library. As noted in the documentation,
The Session object allows you to persist certain parameters across requests. It also persists cookies across all requests made from the Session instance, and will use urllib3's connection pooling. So if you're making several requests to the same host, the underlying TCP connection will be reused, which can result in a significant performance increase (see HTTP persistent connection).
Maybe this other thread about efficient web scraping can help you out.
Upvotes: 1