Reputation: 12007
I have a group of about 10 gzipped files that I would like to archive into a single file in order for a user to download. I am wondering what the best approach to this would be.
myfiles.tar.gz
?myfiles.tar
. Option 1 seems to have unnecessary steps as the original files are already compressed.
Option 2 seems confusing because there is no indication that the files inside the archive are indeed compressed.
How do people usually deal with archiving a group of already compressed files?
I am using Python (if it matters), but I am doing the operations via shell executions.
Upvotes: 0
Views: 187
Reputation: 59563
An gzipped archive of uncompressed files is definitely what your users will want. Since you are using Python, you can skip shelling out and make things a bit cleaner (IMO). It uses tarfile and gzip.GzipFile to handle the archival and compression parts.
Editorial Note: while writing this I stumbled across an interesting bug that you might want to be aware of - https://blog.nelhage.com/2010/02/a-very-subtle-bug/
from __future__ import with_statement # god I hope you don't need this
import gzip
import sys
import tarfile
try:
import io
except ImportError: # makes things work before Python 3
import StringIO as io
with tarfile.open(sys.argv[1], mode='w:gz') as archive:
for name in sys.argv[2:]:
with gzip.GzipFile(name) as gzip_file:
buf = io.StringIO()
buf.write(gzip_file.read())
buf.seek(0)
info = archive.gettarinfo(name)
if info.name.endswith('.gz'):
info.name = info.name[:-3]
info.size = buf.len
archive.addfile(info, fileobj=buf)
Now I probably would not do this if the uncompressed files are large since it is going to read each one into memory as a chunk. This is nice in that it retains the file attributes like perms, times, and what not in the archive file.
Upvotes: 0
Reputation: 241701
A gzipped tar archive is not an archive of compressed files. It is a compressed archive of files. In contrast, a zip archive is an archive of compressed files.
An archive of compressed files is a better archive format, if you want to be able to extract (or update) individual files. But it is an inferior compression technique; unless the component files are mostly quite large or already compressed, compressing the files individually results in quite a bit more overhead.
Since the primary use case of gzipped tar archives is transmission of complete repositories, and the entire archive is normally decompressed at once, the fact that it is not possible to decompress and extract an individual file [Note 1] is not a huge cost. On the other hand, the improved compression ratio brings a noticeable benefit.
To answer the question, the only way to combine multiple gzipped tar archives is to decompress all of them, combine them into a single tar archive, and then recompress the result; option 1 in the original post.
tar
utility will do that transparently. But under the hood, the archive itself is being decompressed. It is not even possible to list the contents of a gzipped tar archive without decompressing the entire archive.Upvotes: 1