Tom
Tom

Reputation: 3336

Python: urllib.error.HTTPError: HTTP Error 525: Origin SSL Handshake Error

I am using Python 3 to crawl many web pages on one website with urllib.request.build_opener. Each web_page_url is opened like below:

_masterOpener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(CookieJar()))
_masterOpener.addheaders = [('Cookie', some_cookie)]
request = _masterOpener.open(web_page_url)
content = request.read()

It always works smoothly when crawling the first hundreds of pages for about 10 minutes (I tried a few times), and then an error like below occurs:

File "D:\ProgramData\Anaconda3\lib\urllib\request.py", line 532, in open
response = meth(req, response)
File "D:\ProgramData\Anaconda3\lib\urllib\request.py", line 642, in http_response
'http', request, response, code, msg, hdrs)
File "D:\ProgramData\Anaconda3\lib\urllib\request.py", line 570, in error
return self._call_chain(*args)
File "D:\ProgramData\Anaconda3\lib\urllib\request.py", line 504, in _call_chain
result = func(*args)
File "D:\ProgramData\Anaconda3\lib\urllib\request.py", line 650, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 525: Origin SSL Handshake Error

I searched on the Web and failed to find a solution. How to solve the problem 'urllib.error.HTTPError: HTTP Error 525: Origin SSL Handshake Error' as described?

Upvotes: 1

Views: 827

Answers (1)

plaes
plaes

Reputation: 32716

HTTP Status 5xx errors indicate an error in the server, and it is your responsibility to handle them gracefully (for example, not to crash your crawler).

In this case, the error 525 issue seems to be CloudFlare-specific, where connection to original site via CloudFlare has timed out.

So just add the try...except clause to handle this error gracefully:

try:
    _masterOpener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(CookieJar()))
    _masterOpener.addheaders = [('Cookie', some_cookie)]
    request = _masterOpener.open(web_page_url)
    content = request.read()
except urllib.error.HTTPError as e:
    # Possible issue with CloudFlare, just fall through
    if e.code == 525:
        # TODO: Log warning about broken url
        pass
    # TODO: ... handle all the other 5xx errors
    # Raise the original exception
    raise

Upvotes: 2

Related Questions