Chetan Shetty
Chetan Shetty

Reputation: 47

How to read data from a zipfile on a website without locally downloading zipfile

I am using the following piece of code:

import zipfile
import urllib

link = "http://www.dummypage.com/dummyfile.zip"
file_handle = urllib.urlopen(link)
zip_file_object = zipfile.ZipFile(file_handle, 'r')

I get the following error on execution. Please help.

Traceback (most recent call last):
  File "fcc.py", line 34, in <module>
    zip_file_object = zipfile.ZipFile(file_handle)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/zipfile.py", line 770, in __init__
    self._RealGetContents()
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/zipfile.py", line 807, in _RealGetContents
    endrec = _EndRecData(fp)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/zipfile.py", line 208, in _EndRecData
    fpin.seek(0, 2)
AttributeError: addinfourl instance has no attribute 'seek'

Upvotes: 0

Views: 335

Answers (2)

tdelaney
tdelaney

Reputation: 77347

Can you use external tools? @ruario 's answer to Bash - how to unzip a piped zip file (from “wget -qO-”) is very interesting. Basically, zip stores its directory at the end of the file and zip tools tend to need the entire file to get to the directory. However, the zip also includes inline headers and some tools can use those. If you don't mind calling out to bsdtar (or other tools), you can do this:

import urllib
import shutil
import subprocess as subp

url_handle = urllib.urlopen("test.zip")
proc = subp.Popen(['bsdtar', '-xf-'], stdin=subp.PIPE)
shutil.copyfileobj(url_handle, proc.stdin)
proc.stdin.close()
proc.wait()

Upvotes: 0

Mauro Baraldi
Mauro Baraldi

Reputation: 6575

You need a streaming handler interface to handle data in memory. For text data, the most common lib used is StringIO. To binary data, the right lib is io.

import io
import urllib
import zipfile

link = "http://www.dummypage.com/dummyfile.zip"
file_handle = io.BytesIO(urllib.urlopen(link).read())
zip_file_object = zipfile.ZipFile(file_handle, 'r')

The point is, the download of the file is done indeed, but it will be in a temp folder. And you don't need to care about it

Upvotes: 1

Related Questions