Chap
Chap

Reputation: 3835

An appendable compressed archive

I have a requirement to maintain a compressed archive of log files. The log filenames are unique and the archive, once expanded, is simply one directory containing all the log files.

The current solution isn't scaling well, since it involves a gzipped tar file. Every time a log file is added, they first decompress the entire archive, add the file, and re-gzip.

Is there a Unix archive tool that can add to a compressed archive without completely expanding and re-compressing? Or can gzip perform this, given the right combination of arguments?

Upvotes: 9

Views: 1429

Answers (3)

Hugues M.
Hugues M.

Reputation: 20467

I'm using zip -Zb for that (appending text logs incrementally to compressed archive):

  • fast append (index is at the end of archive, efficient to update)
  • -Zb uses bzip2 compression method instead of deflate. In 2018 this seems safe to use (you'll need a reasonably modern unzip -- note some tools do assume deflate when they see a zip file, so YMMV)
  • 7z was a good candidate: compression ratio is vastly better than zip when you compress all files in the same operation. But when you append files one by one to the archive (incremental appending), compression ratio is only marginally better than standard zip, and similar to zip -Zb. So for now I'm sticking with zip -Zb.

To clarify what happens and why having the index at the end is useful for "appendable" archive format, with entries compressed individually:

Before:
############## ########### ################# #
[foo1.png    ] [foo2.png ] [foo3.png       ] ^
                                             |
                                         index

After:
############## ########### ################# ########### #
[foo1.png    ] [foo2.png ] [foo3.png       ] [foo4.png ] ^
                                                         |
                                                 new index

So this is not fopen in append mode, but presumably fopen in write mode, then fseek, then write (that's my mental model of it, someone let me know if this is wrong). I'm not 100% certain that it would be so simple in reality, it might depend on OS and file system (e.g. a file system with snapshots might have a very different opinion about how to deal with small writes at the end of a file… huge "YMMV" here 🤷🏻‍♂️)

Upvotes: 5

jwd
jwd

Reputation: 11114

If you don't need to use tar, I suggest 7-Zip. It has an 'add' command, which I believe does what you want.

See related SO question: Is there a way to add a folder to existing 7za archive?

Also, the 7-Zip documentation: https://sevenzip.osdn.jp/chm/cmdline/commands/add.htm

Upvotes: 0

devnull
devnull

Reputation: 123478

It's rather easy to have an appendable archive of compressed files (not same as appendable compressed archive, though).

tar has an option to append files to the end of an archive (Assuming that you have GNU tar)

 -r, --append
       append files to the end of an archive

You can gzip the log files before adding to the archive and can continue to update (append) the archive with newer files.

$ ls -l
foo-20130101.log
foo-20130102.log
foo-20130103.log
$ gzip foo*
$ ls -l
foo-20130101.log.gz
foo-20130102.log.gz
foo-20130103.log.gz
$ tar cvf backup.tar foo*gz

Now you have another log file to add to the archive:

$ ls -l
foo-20130104.log
$ gzip foo-20130104.log
$ tar rvf backup.tar foo-20130104.log
$ tar tf backup.tar
foo-20130101.log.gz
foo-20130102.log.gz
foo-20130103.log.gz
foo-20130104.log.gz

Upvotes: 3

Related Questions