Reputation: 14908
How can I extract the size of the total uncompressed file data in a .tar.gz file from command line?
Upvotes: 85
Views: 130149
Reputation: 1792
Another way that more accurately reports the total size with GNU tar and bsdtar is:
$ tar -O -xf mytar.tar | wc --bytes
The -O
(upper case 'O') makes tar extract the files to stdout, all concatenated. Thus we are counting the number of bytes of the file contents.
The most of the other methods here basically boil down to counting the number of bytes in the archive (if required first decompressing the tar archive if it's compressed). Why is this not accurate?
$TAR_BLOCKING_FACTOR * 512
byte aligned! This is done by padding with zeroes at the end. If I remember correctly the default blocking factor in GNU tar is 20 which means we align to 10 KiB! So in the worst case we may have 9.5 KiB of zeroes at the end of the archive.All in all this means we may potentially arrive at a too high count. The pathological case would be a lot of small, 512 byte unaligned files (for example all 1 byte files).
$ mkdir test/
$ for i in $(seq 1001); do
> printf "x" > test/$i
> done
$ tar -czf test.tar.gz test/
$ zcat test.tar.gz | wc -c
1034240
$ tar -O -xf test.tar.gz | wc -c # or -xzf if we want to be explicit abt. compression
1001
In the above code we create 1001 files each with 1 byte of data and pack them in a gzipped tar archive. With the zcat | wc -c
approach we get a much higher count of nearly 1 MB!
Upvotes: 0
Reputation: 1185
This works for any file size:
zcat archive.tar.gz | wc -c
For files smaller than 4Gb you could also use the -l option with gzip:
$ gzip -l compressed.tar.gz
compressed uncompressed ratio uncompressed_name
132 10240 99.1% compressed.tar
Upvotes: 88
Reputation: 19827
I know this is an old answer; but I wrote a tool just for this two years ago. It’s called gzsize
and it gives you the uncompressed size of a gzip'ed file without actually decompressing the whole file on disk:
$ gzsize <your file>
Upvotes: 12
Reputation: 109
I'm finding everything sites in the web, and don't resolve this problem the get size when file size is bigger of 4GB.
first, which is most faster?
[oracle@base tmp]$ time zcat oracle.20180303.030001.dmp.tar.gz | wc -c 6667028480 real 0m45.761s user 0m43.203s sys 0m5.185s
[oracle@base tmp]$ time gzip -dc oracle.20180303.030001.dmp.tar.gz | wc -c 6667028480 real 0m45.335s user 0m42.781s sys 0m5.153s
[oracle@base tmp]$ time tar -tvf oracle.20180303.030001.dmp.tar.gz -rw-r--r-- oracle/oinstall 111828 2018-03-03 03:05 oracle.20180303.030001.log -rw-r----- oracle/oinstall 6666911744 2018-03-03 03:05 oracle.20180303.030001.dmp real 0m46.669s user 0m44.347s sys 0m4.981s
definitely, tar -xvf is the most faster, but ¿how to cancel executions after get header?
my solution is this:
[oracle@base tmp]$ time echo $(timeout --signal=SIGINT 1s tar -tvf oracle.20180303.030001.dmp.tar.gz | awk '{print $3}') | grep -o '[[:digit:]]*' | awk '{ sum += $1 } END { print sum }' 6667023572 real 0m1.005s user 0m0.013s sys 0m0.066s
Upvotes: 4
Reputation: 572
Use the following command:
tar -xzf archive.tar.gz --to-stdout|wc -c
Upvotes: 9
Reputation: 3051
The command gzip -l archive.tar.gz
doesn't work correctly with file sizes greater than 2Gb. I would recommend zcat archive.tar.gz | wc --bytes
instead for really large files.
Upvotes: 34
Reputation: 14908
This will sum the total content size of the extracted files:
$ tar tzvf archive.tar.gz | sed 's/ \+/ /g' | cut -f3 -d' ' | sed '2,$s/^/+ /' | paste -sd' ' | bc
The output is given in bytes.
Explanation: tar tzvf
lists the files in the archive in verbose format like ls -l
. sed
and cut
isolate the file size field. The second sed
puts a + in front of every size except the first and paste
concatenates them, giving a sum expression that is then evaluated by bc
.
Note that this doesn't include metadata, so the disk space taken up by the files when you extract them is going to be larger - potentially many times larger if you have a lot of very small files.
Upvotes: 45
Reputation: 13
A tar file is uncompressed until/unless it is filtered through another program, such as gzip, bzip2, lzip, compress, lzma, etc. The file size of the tar file is the same as the extracted files, with probably less than 1kb of header info added in to make it a valid tarball.
Upvotes: -2