Harshith Thota
Harshith Thota

Reputation: 864

Ridiculously low download speed with Python requests module

Problem:

I have been trying to make a simple anime downloader using Python's request module. I am tracking the progress using the progressbar2 module. While trying to download, I'm getting speed of 0.x B/s. I assumed the problem is about choosing the chunk_size based on this question. But I am getting the same negligible speeds irrespective of chunk size.

Specs and info:

  1. I am using Windows 10, Python 3.5, latest requests module (2.18.4) and have a decent internet with speed of 40Mbps.
  2. I can download the file from the link through browser(Chrome) and Free Download Manager in about 1 minute.
  3. The link is perfectly working and I have no firewall conflicts.

Code:

import os
import requests
import progressbar
from progressbar import *

os.chdir('D:\\anime\\ongoing')

widgets = ['Downloading: ', Percentage(), ' ', Bar(marker='#',left='[',right=']'),
           ' ', ETA(), FileTransferSpeed()]

url = 'https://lh3.googleusercontent.com/AtkUe87GbrINzTJS_Fj4W08CGqlOg9anwEF7n5-eKXcyS1RsaB8LdzRVaXloiJwiaX2IX1xqUiA=m22?title=(720P%20-%20mp4)Net-juu%20no%20Susume%20Episode%207'
r = requests.get(url,stream=True)
remotesize = r.headers['content-length']

print("Downloading {}.mp4!\n\n".format(url.split('title=')[1]))
pbar = ProgressBar(max_value=int(remotesize),widgets=widgets).start()
i = 0
with open('./tempy/tempy_file.mp4', 'wb') as f:
   for chunk in r.iter_content(chunk_size=5*1024*1024): 
      if chunk:
         i = i + len(chunk)
         f.write(chunk)
         pbar.update(int(i/int(remotesize) * 100))
pbar.finish()         
print("Successfully downloaded!\n\n")

Screenshot:

The speed is just ridiculous.

Expected Solution:

Not sure if this Github Issue was fixed.

  1. It would be preferable to find a solution within requests module but I am open to any answers within the scope of Python that can get me a good speed.
  2. I want the download to be chunk-wise because I want to see the progress via the progressbar. So shutil.copyfileobj(r.raw) isn't what I'm looking for.
  3. I did try using multiple threads but it only complicated things and didn't help. I think the problem is with writing the chunk to the buffer itself and splitting this task between threads doesn't help.

Edit:

As per suggestion, I tried it by including random user agents as shown:

desktop_agents = ['Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36',
                 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36',
                 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36',
                 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_1) AppleWebKit/602.2.14 (KHTML, like Gecko) Version/10.0.1 Safari/602.2.14',
                 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.71 Safari/537.36',
                 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.98 Safari/537.36',
                 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.98 Safari/537.36',
                 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.71 Safari/537.36',
                 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36',
                 'Mozilla/5.0 (Windows NT 10.0; WOW64; rv:50.0) Gecko/20100101 Firefox/50.0']

def random_headers():
    return {'User-Agent': choice(desktop_agents),'Accept':'text/html,video/mp4,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8'}

and sending the request with header as r = requests.get(url,stream=True,headers=random_headers())

However, it made no difference. :(

Edit no. 2:

Tried it with a sample video from "http://www.sample-videos.com/video/mp4/720/big_buck_bunny_720p_5mb.mp4". Same problem persists. :/

Upvotes: 4

Views: 6152

Answers (3)

Michael Ibanez
Michael Ibanez

Reputation: 11

This was asked a while ago, but I believe that the issue observed by OP is not in the downloading but rather in the reporting. The progress bar is being filled up to a max value equal to the value of remotesize:

pbar = ProgressBar(max_value=int(remotesize),widgets=widgets).start()

However, inside the event loop, the bar is updated by taking the size of the downloaded chunks i and DIVIDING it by remotesize and multiplying by 100:

pbar.update(int(i/int(remotesize) * 100))

In essence, the first line sets up the bar to expect absolute values ranging from 0 to the total size of the file, while the seconds expects the bar to be setup in percentage form ranging from 0% to 100%. So, at any moment the bar is actually displaying in units of percent/bits (eg 9%/321053151Bits), and this is why the bar thinks that progress is extremely slow.

Hence either change the first line to:

pbar = ProgressBar(max_value=100,widgets=widgets).start()

Or, change the second line to:

pbar.update(i)

Upvotes: 0

Harshith Thota
Harshith Thota

Reputation: 864

So like the others suggested, google was throttling the speed. In order to overcome this, I used Selenium webdriver to download the links:

from selenium import webdriver
chrome_options = webdriver.ChromeOptions()
prefs = {'download.default_directory' : dir_name}
            chrome_options.add_experimental_option('prefs', prefs)
            driver = webdriver.Chrome(chrome_options=chrome_options)
driver.get(li)

Well, at least I'm able to completely automate the download at the speed possible by google chrome's downloader.

So if anyone can help me figure this one out, please reply in the comments and I'll upvote them if helpful:

  1. Figure out a way in Python to use multiple connections for each file like the way Free Download Manager uses.

Here's the link to the complete script.

Upvotes: 1

M. Matt
M. Matt

Reputation: 77

Have you tried filling your request headers with your user-agent and other headers that Google may need in order to not flag you as a Bot and limit your download speed?

Upvotes: 0

Related Questions