Reputation: 864
I have been trying to make a simple anime downloader using Python's request module. I am tracking the progress using the progressbar2 module. While trying to download, I'm getting speed of 0.x B/s. I assumed the problem is about choosing the chunk_size
based on this question. But I am getting the same negligible speeds irrespective of chunk size.
requests
module (2.18.4) and have a decent internet with speed of 40Mbps.import os
import requests
import progressbar
from progressbar import *
os.chdir('D:\\anime\\ongoing')
widgets = ['Downloading: ', Percentage(), ' ', Bar(marker='#',left='[',right=']'),
' ', ETA(), FileTransferSpeed()]
url = 'https://lh3.googleusercontent.com/AtkUe87GbrINzTJS_Fj4W08CGqlOg9anwEF7n5-eKXcyS1RsaB8LdzRVaXloiJwiaX2IX1xqUiA=m22?title=(720P%20-%20mp4)Net-juu%20no%20Susume%20Episode%207'
r = requests.get(url,stream=True)
remotesize = r.headers['content-length']
print("Downloading {}.mp4!\n\n".format(url.split('title=')[1]))
pbar = ProgressBar(max_value=int(remotesize),widgets=widgets).start()
i = 0
with open('./tempy/tempy_file.mp4', 'wb') as f:
for chunk in r.iter_content(chunk_size=5*1024*1024):
if chunk:
i = i + len(chunk)
f.write(chunk)
pbar.update(int(i/int(remotesize) * 100))
pbar.finish()
print("Successfully downloaded!\n\n")
Not sure if this Github Issue was fixed.
shutil.copyfileobj(r.raw)
isn't what I'm looking for.As per suggestion, I tried it by including random user agents as shown:
desktop_agents = ['Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36',
'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36',
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36',
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_1) AppleWebKit/602.2.14 (KHTML, like Gecko) Version/10.0.1 Safari/602.2.14',
'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.71 Safari/537.36',
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.98 Safari/537.36',
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.98 Safari/537.36',
'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.71 Safari/537.36',
'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36',
'Mozilla/5.0 (Windows NT 10.0; WOW64; rv:50.0) Gecko/20100101 Firefox/50.0']
def random_headers():
return {'User-Agent': choice(desktop_agents),'Accept':'text/html,video/mp4,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8'}
and sending the request with header as r = requests.get(url,stream=True,headers=random_headers())
However, it made no difference. :(
Tried it with a sample video from "http://www.sample-videos.com/video/mp4/720/big_buck_bunny_720p_5mb.mp4". Same problem persists. :/
Upvotes: 4
Views: 6152
Reputation: 11
This was asked a while ago, but I believe that the issue observed by OP is not in the downloading but rather in the reporting. The progress bar is being filled up to a max value equal to the value of remotesize:
pbar = ProgressBar(max_value=int(remotesize),widgets=widgets).start()
However, inside the event loop, the bar is updated by taking the size of the downloaded chunks i and DIVIDING it by remotesize and multiplying by 100:
pbar.update(int(i/int(remotesize) * 100))
In essence, the first line sets up the bar to expect absolute values ranging from 0 to the total size of the file, while the seconds expects the bar to be setup in percentage form ranging from 0% to 100%. So, at any moment the bar is actually displaying in units of percent/bits (eg 9%/321053151Bits), and this is why the bar thinks that progress is extremely slow.
Hence either change the first line to:
pbar = ProgressBar(max_value=100,widgets=widgets).start()
Or, change the second line to:
pbar.update(i)
Upvotes: 0
Reputation: 864
So like the others suggested, google was throttling the speed. In order to overcome this, I used Selenium webdriver to download the links:
from selenium import webdriver
chrome_options = webdriver.ChromeOptions()
prefs = {'download.default_directory' : dir_name}
chrome_options.add_experimental_option('prefs', prefs)
driver = webdriver.Chrome(chrome_options=chrome_options)
driver.get(li)
Well, at least I'm able to completely automate the download at the speed possible by google chrome's downloader.
So if anyone can help me figure this one out, please reply in the comments and I'll upvote them if helpful:
Here's the link to the complete script.
Upvotes: 1
Reputation: 77
Have you tried filling your request headers with your user-agent and other headers that Google may need in order to not flag you as a Bot and limit your download speed?
Upvotes: 0