Reputation: 479

How to list directories size from a tar.gz archive using bash

I have a huge tgz archive and I know there are several directories there, no files in the root.

I want to know exact size of directories files inside to estimate if they'll fit in my mounted volumes.

I've found this thread helping https://stackoverflow.com/a/11721660/1004388

Upvotes: 0

Answers (3)

wl2776

Reputation: 4327

Using only awk and a single pipe:

tar tzvf /tmp/backup.tgz | awk '{
    split($6, a, "/") ; arr[a[1]] += $3
} 
END {
   for (key in arr) printf("%s\t%s\n", key, arr[key])
}'

This code assumes that the third field of the output ($3) contains file size, and sixth ($6) - file path.

Awk's split function splits file path to parts by the separator (/) and stores these parts in array a.

The rest is the same as in two previous answers. a[1] contains root directory for each file in the archive, and arr[a[1]] accumulates file sizes.

Upvotes: 0

janx

Reputation: 28

(Posting as a response, because I cannot comment yet.)

The response of Ilya above is correct, but on my system (Mac OS), the output of tar tzvf /tmp/root.tgz had a different column layout, so I had to replace the first sed and cut with awk '{ print $5,$9 }' (i.e. keep only the 5th and 9th column), giving this new one-liner:

tar tzvf /tmp/root.tgz | awk '{ print $5,$9 }' | cut -f1 -d'/' | awk '{
    arr[$2]+=$1
   }
   END {
     for (key in arr) printf("%s\t%s\n", key, arr[key])
   }'

Upvotes: 0

Ilya Sheershoff

Reputation: 479

This one-liner would do the trick:

tar tzvf /tmp/root.tgz | sed 's/ \+/ /g' | cut -f3,6- -d' ' | cut -f1 -d'/' | awk '{
    arr[$2]+=$1
   }
   END {
     for (key in arr) printf("%s\t%s\n", key, arr[key])
   }'

Example output:

usr 821233945
boot    11150620

Explanation:

tar tzvf filename - lists all files in archive in a ll -r style
sed contracts multiple spaces into single to help cutting
first cut cuts third and sixth fields and leaves everything after the sixth field intact, considering the delimiter is space - now we have size in first column and file path in second column
second cut: since we need only top-level entries, we cut off with the first directory separator - only first field, considering / is the separator
awk is used to group by second field, summing the first one

Upvotes: 2

How to list directories size from a tar.gz archive using bash

Answers (3)

Related Questions