Govind Rai
Govind Rai

Reputation: 15848

Why does tar archive take up 1MB? Shouldn't it take up only 1KB?

I'm reading an article on the difference between tar and zip and gz files.

I am having a hard time understanding how the author came up with 1MB as the size of the tar archive:

One key thing to remember is a plain tar file is just an archive whose data are not compressed. In other words, if you tar 100 files of 50kB, you will end up with an archive whose size will be around 5000kB. The only gain you can expect using tar alone would be by avoiding the space wasted by the file system as most of them allocate space at some granularity (for example, on my system, a one byte long file uses 4kB of disk space, 1000 of them will use 4MB but the corresponding tar archive “only” 1MB).

Shouldn't the size of the archive only take around 1KB? Here's my reasoning:

If you save the extra space saved by the file system, then 1000 files X 1 byte per file should only consume 1000 bytes or 1 KB. So the tar archive should be somewhere around 1KB of size. Why is it 1MB.

I also tested such a scenario on my system (MACOSX Terminal):

mkdir test
cd test
for i in {1..1000}; do echo "" > $i.txt; done
cd ..
tar -cf tarredFile.tar test
ls -l tarredFile.tar

Even the file system shows 1MB for the tar archive. In conclusion, I definitely know my reasoning is incorrect, but I don't know why. What am I overlooking?

Upvotes: 1

Views: 460

Answers (1)

Mark Adler
Mark Adler

Reputation: 112617

The tar format is written in 512-byte blocks. Each one-byte file takes 512 bytes for the header, which contains the file name, and 512 bytes for the file content, of which only one byte is significant. So 1024 bytes per file minimum.

Most of that is zeros, so it compresses quite a bit. gzip gets it to about 9K. Note that that is still far from 1K because you need to store the names of the files in the archive as well.

Upvotes: 4

Related Questions