Reputation: 5655
I'm trying to concurrently download a bunch of urls with both the requests module and python's built in multiprocessing library. When using the two together, i'm experiencing some errors which definitely do not look right. I sent out 100 requests with 100 threads and usually 50 of them end in success while the other 50 receive this message:
TTPConnectionPool(host='www.reuters.com', port=80): Max retries exceeded with url:
/video/2013/10/07/breakingviews-batistas-costly-bluster?videoId=274054858&feedType=VideoRSS&feedName=Business&videoChannel=5&utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+reuters%2FUSVideoBusiness+%28Video+%2F+US+%2F+Business%29 (Caused by <class 'socket.gaierror'>: [Errno 8] nodename nor servname provided, or not known)
Neither the max retries nor the nodename not provided lines look right.
Here is my requests setup:
import requests
req_kwargs = {
'headers' : {'User-Agent': 'np/0.0.1'},
'timeout' : 7,
'allow_redirects' : True
}
# I left out the multiprocessing code but that part isn't important
resp = requests.get(some_url, req_kwargs**)
Does anyone know how to prevent or at least move further in debugging this?
Thank you.
Upvotes: 4
Views: 6025
Reputation: 1
[Errno 8] nodename nor servname provided, or not known
Simply implies it can't resolve www.reuters.com either place the ip resolution in the hosts file or domain
Upvotes: 0
Reputation: 9806
I think it may be caused by high visit frequency that the site doesn't allow.
Try the following:
Upvotes: 2