Reputation: 153
I am trying to get a large file from the web, and stream it directly into the zipfile writer provided by the zipfile
module, something like:
from urllib.request import urlopen
from zipfile import ZipFile
zip_file = ZipFile("/a/certain/local/zip/file.zip","a")
entry = zip_file.open("an.entry","w")
entry.write( urlopen("http://a.certain.file/on?the=web") )
Apparently, this doesn't work because .write
accepts a bytes
argument, not an I/O reader. However, since the file is rather large I don't want to load the whole file into RAM before compressing it.
The simple solution is to use bash (never really tried, could be wrong):
curl -s "http://a.certain.file/on?the=web" | zip -q /a/certain/local/zip/file.zip
but it wouldn't be a very elegant, nor convenient, thing to put a single line of bash in a Python script.
Another solution is to use urllib.request.urlretrieve
to download the file and then pass the path to zipfile.ZipFile.open
, but that way I would still have to wait for the download to complete, and besides that also consume a lot more disk I/O resource.
Is there a way, in Python, to directly pass the download stream to a zipfile writer, like the the bash pipeline above?
Upvotes: 5
Views: 4607
Reputation: 1124308
You can use shutil.copyfileobj()
to efficiently copy data between file objects:
from shutil import copyfileobj
with ZipFile("/a/certain/local/zip/file.zip", "w") as zip_file:
with zip_file.open("an.entry", "w") as entry:
with urlopen("http://a.certain.file/on?the=web") as response:
shutil.copyfileobj(response, entry)
This'll call .read()
with a given chunksize on the source file object, then pass that chunk to the .write()
method on the target file object.
If you are using Python 3.5 or older (where you can't yet directly write to a ZipFile
member), your only option is to stream to a temporary file first:
from shutil import copyfileobj
from tempfile import NamedTemporaryFile
with ZipFile("/a/certain/local/zip/file.zip", "w") as zip_file:
with NamedTemporaryFile() as cache:
with urlopen("http://a.certain.file/on?the=web") as response:
shutil.copyfileobj(response, cache)
cache.flush()
zipfile.write('an.entry', cache.name)
Using a NamedTemporaryFile()
like this only works on POSIX systems, on Windows, you can't open the same filename again, so you'd have to use a tempfile.mkstemp()
generated name, open the file from there, and use try...finally
to clean up afterwards.
Upvotes: 8