TimeoutError: Large amount of data in requests python

Question

I am trying to make a script that scrape presentation from a slideshare link and download it as a PDF.

The script is working fine, until the total slides are under 20. Is there any alternative to requests in python that can do the job.

Here is the scripts:

import requests
from bs4 import BeautifulSoup
from PIL import Image
import io

URL_LESS = "https://www.slideshare.net/angelucmex/global-warming-2373190?qid=8f04572c-48df-4f53-b2b0-0eb71021931c&v=&b=&from_search=1"
URL="https://www.slideshare.net/tusharpanda88/python-basics-59573634?qid=03cb80ee-36f0-4241-a516-454ad64808a8&v=&b=&from_search=5"
r = requests.get(URL_LESS)

soup = BeautifulSoup(r.content, "html5lib")

imgs = soup.find_all('img', class_="slide-image")
imgSRC = [x.get("srcset").split(',')[0].strip().split(' ')[0].split('?')[0] for x in imgs]

imagesJPG = []
for img in imgSRC:
    im = requests.get(img)
    f = io.BytesIO(im.content)
    imgJPG = Image.open(f)
    imagesJPG.append(imgJPG)

imagesJPG[0].save(f"{soup.title.string}.pdf",save_all=True, append_images=imagesJPG[1:])

Try changing URL_LESS to URL, you will get the idea.

Here is the traceback

Traceback (most recent call last):
  File "D:\Work\py\scrapingScripts	kinter\env\lib\site-packages\urllib3\connection.py", line 174, in _new_conn
    conn = connection.create_connection(
  File "D:\Work\py\scrapingScripts	kinter\env\lib\site-packages\urllib3\util\connection.py", line 95, in create_connection
    raise err
  File "D:\Work\py\scrapingScripts	kinter\env\lib\site-packages\urllib3\util\connection.py", line 85, in create_connection
    sock.connect(sa)
TimeoutError: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "D:\Work\py\scrapingScripts	kinter\env\lib\site-packages\urllib3\connectionpool.py", line 703, in urlopen
    httplib_response = self._make_request(
  File "D:\Work\py\scrapingScripts	kinter\env\lib\site-packages\urllib3\connectionpool.py", line 386, in _make_request
    self._validate_conn(conn)
  File "D:\Work\py\scrapingScripts	kinter\env\lib\site-packages\urllib3\connectionpool.py", line 1040, in _validate_conn
    conn.connect()
  File "D:\Work\py\scrapingScripts	kinter\env\lib\site-packages\urllib3\connection.py", line 358, in connect
    conn = self._new_conn()
  File "D:\Work\py\scrapingScripts	kinter\env\lib\site-packages\urllib3\connection.py", line 186, in _new_conn
    raise NewConnectionError(
urllib3.exceptions.NewConnectionError: : Failed to establish a new connection: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "D:\Work\py\scrapingScripts	kinter\env\lib\site-packages
equests\adapters.py", line 440, in send
    resp = conn.urlopen(
  File "D:\Work\py\scrapingScripts	kinter\env\lib\site-packages\urllib3\connectionpool.py", line 785, in urlopen
    retries = retries.increment(
  File "D:\Work\py\scrapingScripts	kinter\env\lib\site-packages\urllib3\util
etry.py", line 592, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='image.slidesharecdn.com', port=443): Max retries exceeded with url: /pythonbasics-160315100530/85/python-basics-8-320.jpg (Caused by NewConnectionError(': Failed to establish a new connection: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "d:\Work\py\scrapingScripts\slideshare\main.py", line 16, in 
    im = requests.get(img)
  File "D:\Work\py\scrapingScripts	kinter\env\lib\site-packages
equests\api.py", line 75, in get
    return request('get', url, params=params, **kwargs)
  File "D:\Work\py\scrapingScripts	kinter\env\lib\site-packages
equests\api.py", line 61, in request
    return session.request(method=method, url=url, **kwargs)
  File "D:\Work\py\scrapingScripts	kinter\env\lib\site-packages
equests\sessions.py", line 529, in request
    resp = self.send(prep, **send_kwargs)
  File "D:\Work\py\scrapingScripts	kinter\env\lib\site-packages
equests\sessions.py", line 645, in send
    r = adapter.send(request, **kwargs)
  File "D:\Work\py\scrapingScripts	kinter\env\lib\site-packages
equests\adapters.py", line 519, in send
    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='image.slidesharecdn.com', port=443): Max retries exceeded with url: /pythonbasics-160315100530/85/python-basics-8-320.jpg (Caused by NewConnectionError(': Failed to establish a new connection: [WinError 10060] A connection attempt failed because the connected party did 
not properly respond after a period of time, or established connection failed because connected host has failed to respond'))

Shacklebolt13 · Accepted Answer

The script worked perfectly for me both when using URL and URL_LESS, so your internet might be the culprit here.

My guesses are:

You're having a slow/inconsistent internet.
Slideshare is blacklisting your IP/ web-agent maybe for DDOS protection.(unlikely)
You're Using ipv6, which has been the culprit in these kind of cases for me, try switching your network to use ipv4 only.

and when it comes to requests, I have personally used it to scrape a fairly large amount of data for a fairly long time so I can say it's an amazing library to use

TimeoutError: Large amount of data in requests python

Answers (1)

Related Questions