Reputation: 11
I wrote a web crawler to crawl product infomation from www.amazon.com by using urllib2,but it seems that amazon limit the connection for each IP to 1.
When I start more than one thread to crawl simultaneously, it raises HTTP Error 503: Service Temporarily Unavailable
.
I want to start more threads to crawl fast,so how can I fix this error?
Upvotes: 1
Views: 218
Reputation: 5401
Use python requests module to make connection through proxy IPs . The code will look like
import requests
proxies = {
"http": "<an HTTP proxy IP>",
"https": "<an HTTPS proxy IP>"
}
response = requests.get("http://your_url.com", proxies=proxies)
You should be able to get HTTP and HTTPS proxy ips from here See this for more help
Upvotes: 0
Reputation: 70863
You should probably switch to use the Amazon API for product queries.
Upvotes: 0