Reputation: 4563
I am maintaining some old scripts and I came across this:
tar -cvf - ${files} | gzip -n -c | openssl ...
Is there any practical difference between this and a more compact, without the "-n" to gzip? Is there some other way to pass "-n" to gzip in tar
command?
tar -cvzf - ${files} | openssl ...
This is on Linux 3.0.101-0.47.71-default. I would expect a slight performance improvement but my concern is not causing a change downstream.
Upvotes: 0
Views: 731
Reputation: 36337
old tar
didn't have gzip compression built in. On any gzip
I know, n
has no meaning when fed through stdin
. Aside, of course from setting the compression timestamp in the gzip data to 0.
I doubt the usefulness of that option, to be honest – on anything I could test, the size remained the same. That is correct behaviour – the header (as specified in RFC 1952, Sec. 2.2) simply has a 4B timespec in seconds since epoch – if you set it to 0, it implies "no timespec saved". So, unless you need to not let the receiver of the gzip'ed data know when it was compressed, -n
has no benefit. (There might be security benefits stemming from omission of such time stamps if you've got some authentication scheme that is based on unknown timing, eg. Session IDs generated from the time since booting a device, but to be frank, I'd rather worry about plugging those security holes than setting a timestamp to zero.)
Upvotes: 2
Reputation: 112219
It turns out that there can be a significant difference. At least for the tar that I'm using (the default tar on macOS). tar writes all output in blocks when writing to stdout, padding with zeros to complete the last block, even when compressing, padding after the compressed data. This is documented in the man page, so it is not a bug. The default block size is 10K. The result of tar -cvzf
is almost always larger than tar -cvf ... | gzip
, by an average of 5K, using the default blocking factor.
From the man page:
All archive output is written in correctly-sized blocks, even if the out-
put is being compressed. Whether or not the last output block is padded
to a full block size varies depending on the format and the output
device. For tar and cpio formats, the last block of output is padded to
a full block size if the output is being written to standard output or to
a character or block device such as a tape drive. If the output is being
written to a regular file, the last block will not be padded. Many com-
pressors, including gzip(1) and bzip2(1), complain about the null padding
when decompressing an archive created by tar, although they still extract
it correctly.
I am using bsdtar 2.8.3.
GNU tar 1.29 does not do this, though the documentation seems to indicate it should:
If the output goes directly to a local disk, and not through stdout,
then the last write is not extended to a full record size. Otherwise,
reblocking occurs.
The documentation goes on to note the same issue with gzip complaining about trailing zeros. Yet there are no trailing zeroes when going through stdout.
Upvotes: 2