TFC
TFC

Reputation: 546

Python scraping: Error 54 'Connection reset by peer'

I have wrote simple script to get html's from multiple website. Although I didn't have any issue with the script up until yesterday. It suddenly started throwing the exception bellow.

Traceback (most recent call last):
  File "crowling.py", line 45, in <module>
    result = requests.get(url)
  File "/Users/gen/.pyenv/versions/3.7.1/lib/python3.7/site-packages/requests/api.py", line 76, in get
    return request('get', url, params=params, **kwargs)
  File "/Users/gen/.pyenv/versions/3.7.1/lib/python3.7/site-packages/requests/api.py", line 61, in request
    return session.request(method=method, url=url, **kwargs)
  File "/Users/gen/.pyenv/versions/3.7.1/lib/python3.7/site-packages/requests/sessions.py", line 530, in request
    resp = self.send(prep, **send_kwargs)
  File "/Users/gen/.pyenv/versions/3.7.1/lib/python3.7/site-packages/requests/sessions.py", line 685, in send
    r.content
  File "/Users/gen/.pyenv/versions/3.7.1/lib/python3.7/site-packages/requests/models.py", line 829, in content
    self._content = b''.join(self.iter_content(CONTENT_CHUNK_SIZE)) or b''
  File "/Users/gen/.pyenv/versions/3.7.1/lib/python3.7/site-packages/requests/models.py", line 754, in generate
    raise ChunkedEncodingError(e)
requests.exceptions.ChunkedEncodingError: ("Connection broken: ConnectionResetError(54, 'Connection reset by peer')", ConnectionResetError(54, 'Connection reset by peer'))

The main part of the script is this.

c = 0
#urls is the list of urls as strings
for url in urls:
    result = requests.get(url)
    c += 1
    with open('htmls/p{}.html'.format(c),'w',encoding='UTF-8') as f:
        f.write(result.text)

The list urls is generated by my other codes and I have checked that the urls are correct. Also the timing of the exception is not constant. Sometimes it stops when scraping 20th htmls and sometimes it goes until 80th then stop. As the exception suddenly appeared without changing codes, I am guessing that the exception is due to the Internet connection. Yet, I want to ensure that the script works stably. Is there any possible cause of this error?

Upvotes: 2

Views: 11577

Answers (1)

Mike67
Mike67

Reputation: 11342

If you're sure the URLs are correct and it's an intermittent connection problem, you can just retry the connection after failure:

c = 0
#urls is the list of urls as strings
for url in urls:
    trycnt = 3  # max try cnt
    while trycnt > 0:
        try:
           result = requests.get(url)
           c += 1
           with open('htmls/p{}.html'.format(c),'w',encoding='UTF-8') as f:
               f.write(result.text)
           trycnt = 0 # success
        except ChunkedEncodingError as ex:
           if trycnt <= 0: print("Failed to retrieve: " + url + "\n" + str(ex))  # done retrying
           else: trycnt -= 1  # retry
           time.sleep(0.5)  # wait 1/2 second then retry
     # go to next URL

Upvotes: 5

Related Questions