Reputation: 41
In python 2.7.3, I try to create a script to download a file over the Internet. I use the urllib2 module.
Here, what I have done :
import urllib2
HTTP_client = urllib2.build_opener()
#### Here I can modify HTTP_client headers
URL = 'http://www.google.com'
data = HTTP_client.open(URL)
with open ('file.txt','wb') as f:
f.write(data.read())
OK. That's work perfectly.
The problem is when I want to save big files (hundreds of MB). I think that when I call the 'open' method, it downloads the file in memory. But, what about large files ? It will not save 1 GB of data in memory !! What happen if i lost connection, all the downloaded part is lost.
How to download large files in Python like wget does ? In wget, it downloads the file 'directly' in hard disk. We can see the file growning up in size.
I'm surprised there is no method 'retrieve' for doing stuff like
HTTP_client.retrieve(URL, 'filetosave.ext')
Upvotes: 1
Views: 6219
Reputation:
To resolve this, you can read chunks at a time and write them to file.
req = urllib2.urlopen(url)
CHUNK = 16 * 1024
with open(file, 'wb') as fp:
while True:
chunk = req.read(CHUNK)
if not chunk: break
fp.write(chunk)
Upvotes: 2