Reputation: 4811
I have a Django app that creates a .tar.gz file for download. Locally, I run on my dev machine Python 2.7, and on my remote dev server, Python 2.6.6. When I download the files, I can open both via Mac Finder / command line and view the contents. However, Python 2.7 does not like the .tar.gz file created on my remote dev server...and I need to upload these files to a site that uses Python to unpack / parse the archives. How can I debug what is wrong? In a Python shell:
>>> tarfile.is_tarfile('myTestFile_remote.tar.gz')
False
>>> tarfile.is_tarfile('myTestFile_local.tar.gz')
True
>>> f = tarfile.open('myTestFile_remote.tar.gz', 'r:gz')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/tarfile.py", line 1678, in open
return func(name, filemode, fileobj, **kwargs)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/tarfile.py", line 1727, in gzopen
**kwargs)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/tarfile.py", line 1705, in taropen
return cls(name, mode, fileobj, **kwargs)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/tarfile.py", line 1574, in __init__
self.firstmember = self.next()
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/tarfile.py", line 2331, in next
raise ReadError(str(e))
tarfile.ReadError: invalid header
From this SO question, I also tried running gzip -t
against the remote file, but no output (which I believe means the file is OK). From this other SO question, I ran file myTestFile_remote.tar.gz
, and I believe the output shows a correct file format:
myTestFile_remote.tar.gz: gzip compressed data, from Unix
I'm not quite sure what else I can try. It seems like the exception is thrown because my tarfile has self.offset == 0
, but I don't know what that means, and I don't understand how to create the tarfile so that this does not happen. Suggestions are welcome...
Not sure what code would be useful here. My code to create and return the tarfile:
zip_filename = '%s_%s.tar.gz' % (course.name, course.url)
s = cStringIO.StringIO()
zf = tarfile.open(zip_filename, mode='w:gz', fileobj=s)
<add a bunch of stuff>
zipped = zip_collection(zip_data)
zf.close()
if zipped:
response = HttpResponse(content_type="application/tar")
response['Content-Disposition'] = 'attachment; filename=%s' % zip_filename
s.seek(0, os.SEEK_END)
response.write(s.getvalue())
------ UPDATE ------
Per this SO post, I also verified that the remote file is a tar.gz file, using tar -zxvf myTestFile_remote.tar.gz
from the command line. The file extracts just fine.
Upvotes: 0
Views: 3045
Reputation: 3806
I think the problem is in the zlib
and not in the tarfile itself.
Workarounds:
create file using bz2
tarfile.open(zip_filename, mode='w:bz2', fileobj=s)
force the level of compression (both write/read)
zf = tarfile.open(zip_filename, mode='w:gz', fileobj=s, compresslevel=9)
zf = tarfile.open(zip_filename, mode='r:gz', compresslevel=9)
lower level of compression until the problem disappear
zf = tarfile.open(zip_filename, mode='w:gz', fileobj=s, compresslevel=[9-0])
totally remove compression
tarfile.open(zip_filename, mode='w', fileobj=s)
the last one is only if the compression is absolutely needed and none of the previous works:
f = open(zip_filename, "w")
proc = subprocess.Popen(["gzip", "-9"], stdin=subprocess.PIPE, stdout=fobj)
tar = tarfile.open(fileobj=proc.stdin, mode="w|")
tar.add(...)
tar.close()
proc.stdin.close()
f.close()
Upvotes: 2