zer0stimulus
zer0stimulus

Reputation: 23676

Python: How to download a zip file

I'm attempting to download a zip file using this code:

o = urllib2.build_opener( urllib2.HTTPCookieProcessor() )

#login
p = urllib.urlencode( { usernameField: usernameVal, passField: passVal } )
f = o.open(authUrl,  p )
data = f.read()
print data
f.close()

#download file
f = o.open(remoteFileUrl)
localFile = open(localFile, "wb")
localFile.write(f.read())
f.close()

I am getting some binary data, but the size of the file I "downloaded" is too small and is not a valid zip file. Am I not retrieving the zip file properly? The HTTP response header for f = o.open(remoteFileUrl) is shown below. I don't know if special processing is needed to handle this response:

HTTP/1.1 200 OK Server:
Apache-Coyote/1.1 Pragma: private
Cache-Control: must-revalidate
Expires: Tue, 31 Dec 1997 23:59:59 GMT
Content-Disposition: inline;
filename="files.zip";
Content-Type: application/zip
Transfer-Encoding: chunked

Upvotes: 6

Views: 6451

Answers (4)

Gourneau
Gourneau

Reputation: 12878

Here is a more robust solution using urllib2 to download the file in chunks and print the status of the download

import os
import urllib2
import math

def downloadChunks(url):
    """Helper to download large files
        the only arg is a url
       this file will go to a temp directory
       the file will also be downloaded
       in chunks and print out how much remains
    """

    baseFile = os.path.basename(url)

    #move the file to a more uniq path
    os.umask(0002)
    temp_path = "/tmp/"
    try:
        file = os.path.join(temp_path,baseFile)

        req = urllib2.urlopen(url)
        total_size = int(req.info().getheader('Content-Length').strip())
        downloaded = 0
        CHUNK = 256 * 10240
        with open(file, 'wb') as fp:
            while True:
                chunk = req.read(CHUNK)
                downloaded += len(chunk)
                print math.floor( (downloaded / total_size) * 100 )
                if not chunk: break
                fp.write(chunk)
    except urllib2.HTTPError, e:
        print "HTTP Error:",e.code , url
        return False
    except urllib2.URLError, e:
        print "URL Error:",e.reason , url
        return False

    return file

Upvotes: 1

leoluk
leoluk

Reputation: 12981

Try this:

#download file
f = o.open(remoteFileUrl)

response = ""
while 1:
    data = f.read()
    if not data:
        break
    response += data

with open(localFile, "wb") as local_file:
    local_file.write(response)

Upvotes: 0

James
James

Reputation: 424

If you don't mind reading the whole zip-file to memory, the fastest way to read and write it is as follows:

data  = f.readlines()
with open(localFile,'wb') as output:
    output.writelines(data)

Otherwise, to read and write in chunks as you get them over the network, do

with open(localFile, "wb") as output:
    chunk = f.read()
    while chunk:
        output.write(chunk)
        chunk = f.read()

This is a little less neat, but avoids keeping the whole file in memory at once. Hope it helps.

Upvotes: 1

RichieHindle
RichieHindle

Reputation: 281875

f.read() doesn't necessarily read the whole file, but just a packet of it (which might be the whole file if it's small, but won't be for a large file).

You need to loop over the packets like this:

while 1:
   packet = f.read()
   if not packet:
      break
   localFile.write(packet)
f.close()

f.read() returns an empty packet to signify that you've read the whole file.

Upvotes: 10

Related Questions