user
user

Reputation: 4811

Python tarfile not creating valid .tar.gz file

I have a Django app that creates a .tar.gz file for download. Locally, I run on my dev machine Python 2.7, and on my remote dev server, Python 2.6.6. When I download the files, I can open both via Mac Finder / command line and view the contents. However, Python 2.7 does not like the .tar.gz file created on my remote dev server...and I need to upload these files to a site that uses Python to unpack / parse the archives. How can I debug what is wrong? In a Python shell:

>>> tarfile.is_tarfile('myTestFile_remote.tar.gz')
False

>>> tarfile.is_tarfile('myTestFile_local.tar.gz')
True

>>> f = tarfile.open('myTestFile_remote.tar.gz', 'r:gz')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/tarfile.py", line 1678, in open
    return func(name, filemode, fileobj, **kwargs)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/tarfile.py", line 1727, in gzopen
    **kwargs)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/tarfile.py", line 1705, in taropen
    return cls(name, mode, fileobj, **kwargs)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/tarfile.py", line 1574, in __init__
    self.firstmember = self.next()
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/tarfile.py", line 2331, in next
    raise ReadError(str(e))
tarfile.ReadError: invalid header

From this SO question, I also tried running gzip -t against the remote file, but no output (which I believe means the file is OK). From this other SO question, I ran file myTestFile_remote.tar.gz, and I believe the output shows a correct file format:

myTestFile_remote.tar.gz: gzip compressed data, from Unix

I'm not quite sure what else I can try. It seems like the exception is thrown because my tarfile has self.offset == 0, but I don't know what that means, and I don't understand how to create the tarfile so that this does not happen. Suggestions are welcome...

Not sure what code would be useful here. My code to create and return the tarfile:

zip_filename = '%s_%s.tar.gz' % (course.name, course.url)
s = cStringIO.StringIO()
zf = tarfile.open(zip_filename, mode='w:gz', fileobj=s)

<add a bunch of stuff>

zipped = zip_collection(zip_data)
zf.close()

if zipped:
    response = HttpResponse(content_type="application/tar")
    response['Content-Disposition'] = 'attachment; filename=%s' % zip_filename
    s.seek(0, os.SEEK_END)
    response.write(s.getvalue())

------ UPDATE ------ Per this SO post, I also verified that the remote file is a tar.gz file, using tar -zxvf myTestFile_remote.tar.gz from the command line. The file extracts just fine.

Upvotes: 0

Views: 3045

Answers (1)

sax
sax

Reputation: 3806

I think the problem is in the zlib and not in the tarfile itself.

Workarounds:

  • create file using bz2
    tarfile.open(zip_filename, mode='w:bz2', fileobj=s)

  • force the level of compression (both write/read)

    zf = tarfile.open(zip_filename, mode='w:gz', fileobj=s, compresslevel=9)

    zf = tarfile.open(zip_filename, mode='r:gz', compresslevel=9)

  • lower level of compression until the problem disappear

    zf = tarfile.open(zip_filename, mode='w:gz', fileobj=s, compresslevel=[9-0])

  • totally remove compression

    tarfile.open(zip_filename, mode='w', fileobj=s)

the last one is only if the compression is absolutely needed and none of the previous works:

f = open(zip_filename, "w") 
proc = subprocess.Popen(["gzip", "-9"], stdin=subprocess.PIPE, stdout=fobj) 
tar = tarfile.open(fileobj=proc.stdin, mode="w|") 
tar.add(...) 
tar.close() 
proc.stdin.close() 
f.close() 

Upvotes: 2

Related Questions