Reputation: 365
I am trying to scrape data from google and linkedin. Somehow it gave me this error:
*** httperror_seek_wrapper: HTTP Error 503: Service Unavailable
Can someone help advice how I solve this?
Upvotes: 2
Views: 9302
Reputation: 21
I encountered the same situation and tried using the sleep() function before every request to spread the requests a little. It looked like it was working fine but failed soon enough even with a delay of 2 seconds. What solved it finally was using:
with contextlib.closing(urllib.urlopen(urlToOpen)) as x:
#do stuff with x.
This I did because I thought opening too many requests keeps it open and had to closed. Nevertheless, it worked quite consistently with as less as 0.5s delay time.
Upvotes: 2
Reputation: 1573
Google is simply detecting your query as automated. You would need a captcha solver to get unlimited results. The following link might be helpful.
https://support.google.com/websearch/answer/86640?hl=en
Bypassing Captcha using an OCR Engine:
http://www.debasish.in/2012/01/bypass-captcha-using-python-and.html
Simple Approach:
An even simpler approach is to simply use sleep() a few times and to generate random queries. This way google will not spot that you are using an automated system. But the system is far slower ...
Error Handling:
To simply get remove the error message use try and except
Upvotes: 2