Ilya Sheershoff
Ilya Sheershoff

Reputation: 479

How to list directories size from a tar.gz archive using bash

I have a huge tgz archive and I know there are several directories there, no files in the root.

I want to know exact size of directories files inside to estimate if they'll fit in my mounted volumes.

I've found this thread helping https://stackoverflow.com/a/11721660/1004388

Upvotes: 0

Views: 1376

Answers (3)

wl2776
wl2776

Reputation: 4327

Using only awk and a single pipe:

tar tzvf /tmp/backup.tgz | awk '{
    split($6, a, "/") ; arr[a[1]] += $3
} 
END {
   for (key in arr) printf("%s\t%s\n", key, arr[key])
}'

This code assumes that the third field of the output ($3) contains file size, and sixth ($6) - file path.

Awk's split function splits file path to parts by the separator (/) and stores these parts in array a.

The rest is the same as in two previous answers. a[1] contains root directory for each file in the archive, and arr[a[1]] accumulates file sizes.

Upvotes: 0

janx
janx

Reputation: 28

(Posting as a response, because I cannot comment yet.)

The response of Ilya above is correct, but on my system (Mac OS), the output of tar tzvf /tmp/root.tgz had a different column layout, so I had to replace the first sed and cut with awk '{ print $5,$9 }' (i.e. keep only the 5th and 9th column), giving this new one-liner:

tar tzvf /tmp/root.tgz | awk '{ print $5,$9 }' | cut -f1 -d'/' | awk '{
    arr[$2]+=$1
   }
   END {
     for (key in arr) printf("%s\t%s\n", key, arr[key])
   }'

Upvotes: 0

Ilya Sheershoff
Ilya Sheershoff

Reputation: 479

This one-liner would do the trick:

tar tzvf /tmp/root.tgz | sed 's/ \+/ /g' | cut -f3,6- -d' ' | cut -f1 -d'/' | awk '{
    arr[$2]+=$1
   }
   END {
     for (key in arr) printf("%s\t%s\n", key, arr[key])
   }'

Example output:

usr 821233945
boot    11150620

Explanation:

  1. tar tzvf filename - lists all files in archive in a ll -r style
  2. sed contracts multiple spaces into single to help cutting
  3. first cut cuts third and sixth fields and leaves everything after the sixth field intact, considering the delimiter is space - now we have size in first column and file path in second column
  4. second cut: since we need only top-level entries, we cut off with the first directory separator - only first field, considering / is the separator
  5. awk is used to group by second field, summing the first one

Upvotes: 2

Related Questions