Reputation: 23676
I'm attempting to download a zip file using this code:
o = urllib2.build_opener( urllib2.HTTPCookieProcessor() )
#login
p = urllib.urlencode( { usernameField: usernameVal, passField: passVal } )
f = o.open(authUrl, p )
data = f.read()
print data
f.close()
#download file
f = o.open(remoteFileUrl)
localFile = open(localFile, "wb")
localFile.write(f.read())
f.close()
I am getting some binary data, but the size of the file I "downloaded" is too small and is not a valid zip file. Am I not retrieving the zip file properly? The HTTP response header for f = o.open(remoteFileUrl)
is shown below. I don't know if special processing is needed to handle this response:
HTTP/1.1 200 OK Server:
Apache-Coyote/1.1 Pragma: private
Cache-Control: must-revalidate
Expires: Tue, 31 Dec 1997 23:59:59 GMT
Content-Disposition: inline;
filename="files.zip";
Content-Type: application/zip
Transfer-Encoding: chunked
Upvotes: 6
Views: 6451
Reputation: 12878
Here is a more robust solution using urllib2 to download the file in chunks and print the status of the download
import os
import urllib2
import math
def downloadChunks(url):
"""Helper to download large files
the only arg is a url
this file will go to a temp directory
the file will also be downloaded
in chunks and print out how much remains
"""
baseFile = os.path.basename(url)
#move the file to a more uniq path
os.umask(0002)
temp_path = "/tmp/"
try:
file = os.path.join(temp_path,baseFile)
req = urllib2.urlopen(url)
total_size = int(req.info().getheader('Content-Length').strip())
downloaded = 0
CHUNK = 256 * 10240
with open(file, 'wb') as fp:
while True:
chunk = req.read(CHUNK)
downloaded += len(chunk)
print math.floor( (downloaded / total_size) * 100 )
if not chunk: break
fp.write(chunk)
except urllib2.HTTPError, e:
print "HTTP Error:",e.code , url
return False
except urllib2.URLError, e:
print "URL Error:",e.reason , url
return False
return file
Upvotes: 1
Reputation: 12981
Try this:
#download file
f = o.open(remoteFileUrl)
response = ""
while 1:
data = f.read()
if not data:
break
response += data
with open(localFile, "wb") as local_file:
local_file.write(response)
Upvotes: 0
Reputation: 424
If you don't mind reading the whole zip-file to memory, the fastest way to read and write it is as follows:
data = f.readlines()
with open(localFile,'wb') as output:
output.writelines(data)
Otherwise, to read and write in chunks as you get them over the network, do
with open(localFile, "wb") as output:
chunk = f.read()
while chunk:
output.write(chunk)
chunk = f.read()
This is a little less neat, but avoids keeping the whole file in memory at once. Hope it helps.
Upvotes: 1
Reputation: 281875
f.read()
doesn't necessarily read the whole file, but just a packet of it (which might be the whole file if it's small, but won't be for a large file).
You need to loop over the packets like this:
while 1:
packet = f.read()
if not packet:
break
localFile.write(packet)
f.close()
f.read()
returns an empty packet to signify that you've read the whole file.
Upvotes: 10