Reputation: 479
I have a huge tgz archive and I know there are several directories there, no files in the root.
I want to know exact size of directories files inside to estimate if they'll fit in my mounted volumes.
I've found this thread helping https://stackoverflow.com/a/11721660/1004388
Upvotes: 0
Views: 1376
Reputation: 4327
Using only awk and a single pipe:
tar tzvf /tmp/backup.tgz | awk '{
split($6, a, "/") ; arr[a[1]] += $3
}
END {
for (key in arr) printf("%s\t%s\n", key, arr[key])
}'
This code assumes that the third field of the output ($3) contains file size, and sixth ($6) - file path.
Awk's split
function splits file path to parts by the separator (/) and stores these parts in array a
.
The rest is the same as in two previous answers. a[1]
contains root directory for each file in the archive, and arr[a[1]]
accumulates file sizes.
Upvotes: 0
Reputation: 28
(Posting as a response, because I cannot comment yet.)
The response of Ilya above is correct, but on my system (Mac OS), the output of tar tzvf /tmp/root.tgz
had a different column layout, so I had to replace the first sed
and cut
with awk '{ print $5,$9 }'
(i.e. keep only the 5th and 9th column), giving this new one-liner:
tar tzvf /tmp/root.tgz | awk '{ print $5,$9 }' | cut -f1 -d'/' | awk '{
arr[$2]+=$1
}
END {
for (key in arr) printf("%s\t%s\n", key, arr[key])
}'
Upvotes: 0
Reputation: 479
This one-liner would do the trick:
tar tzvf /tmp/root.tgz | sed 's/ \+/ /g' | cut -f3,6- -d' ' | cut -f1 -d'/' | awk '{
arr[$2]+=$1
}
END {
for (key in arr) printf("%s\t%s\n", key, arr[key])
}'
Example output:
usr 821233945
boot 11150620
Explanation:
tar tzvf filename
- lists all files in archive in a ll -r
stylesed
contracts multiple spaces into single to help cuttingcut
cuts third and sixth fields and leaves everything after the sixth field intact, considering the delimiter is space - now we have size in first column and file path in second columncut
: since we need only top-level entries, we cut off with the first directory separator - only first field, considering /
is the separatorawk
is used to group by second field, summing the first oneUpvotes: 2