Reputation: 927
Given the following folder structure (with the size in bytes in parenthesis):
- dir
- f1.txt (1754)
- f2.txt (9811)
When I run gzip -r dir
, I get:
- dir
- f1.txt.gz (654)
- f2.txt.gz (804)
Now when I do tar -cf dir.tar dir
(where dir
contains the compressed files), I expect the size of dir.tar
to be roughly 654 + 804 = 1450
. But it turns out that it is 10240, which is the size of the f1.txt + f2.txt
! Why???
Upvotes: 0
Views: 111
Reputation: 3735
Let's work through an example to confirm what you are seeing.
Here I have a directory, x
, with two files.
# ls -l x
total 12
-rw-r--r-- 1 root root 3902 Jan 30 17:00 log1.txt
-rw-r--r-- 1 root root 7518 Jan 30 17:00 log.txt
Compress the files
# gzip -9v x/*
x/log1.txt: 90.6% -- replaced with x/log1.txt.gz
x/log.txt: 84.5% -- replaced with x/log.txt.gz
Confirm that compression has worked
# ls -l x
total 8
-rw-r--r-- 1 root root 392 Jan 30 17:00 log1.txt.gz
-rw-r--r-- 1 root root 1195 Jan 30 17:00 log.txt.gz
Put the files into a tar, x.tar
# tar cvf x.tar x
x/
x/log1.txt.gz
x/log.txt.gz
and check the resulting size. I got 10240
as well.
# ls -l x.tar
-rw-r--r-- 1 root root 10240 Jan 31 09:02 x.tar
The reason is quite simple - the tar format works in fixed block sizes, so there will be a lot of padding will NULL bytes. See here for the gory details. For small file sizes like this these padding bytes will dominate. If you look at a hex dump of this tar file it contains mostly NULL padding bytes.
This is why it is better to put the uncompressed version of the files into the tar, then compress that.
Here is an example.
Put the uncompressed files into x.tar
# ls -l x
total 12
-rw-r--r-- 1 root root 3902 Jan 30 17:00 log1.txt
-rw-r--r-- 1 root root 7518 Jan 30 17:00 log.txt
# tar cvf x.tar x
x/
x/log1.txt
x/log.txt
# ls -l x.tar
-rw-r--r-- 1 root root 20480 Jan 31 09:06 x.tar
Now compress the tar file. 1761
bytes is a lot better.
# gzip -9v x.tar
x.tar: 91.7% -- replaced with x.tar.gz
# ls -l x.tar.gz
-rw-r--r-- 1 root root 1761 Jan 31 09:06 x.tar.gz
Upvotes: 2
Reputation: 690
It seems you generate tar file for both original and compressed files. To make it sure, you can list the tar file contents.
tar -tf dir.tar.gz
Simply you can tar + gzip directory as bellow:
tar -zcvf dir.tar.gz dir/
Hope this helps.
Upvotes: 0