Matti
Matti

Reputation: 469

Human readable, recursive, sorted list of largest files

What is the best practice for printing a top 10 list of largest files in a POSIX shell? There has to be something more elegant than my current solution:

DIR="."
N=10
LIMIT=512000

find $DIR -type f -size +"${LIMIT}k" -exec du {} \; | sort -nr | head -$N | perl -p -e 's/^\d+\s+//' | xargs -I {} du -h {}

where LIMIT is a file size threshold to limit the results of find.

Upvotes: 7

Views: 3874

Answers (1)

Dennis Williamson
Dennis Williamson

Reputation: 360325

Edit:

Using Gnu utilities (du and sort):

du -0h | sort -zrh | tr '\0' '\n'

This uses a null delimiter to pass information between du and sort and uses tr to convert the nulls to newlines. The nulls allow this pipeline to process filenames which may include newlines. Both -h options cause the output to be in human-readable form.

Original:

This uses awk to create extra columns for sort keys. It only calls du once. The output should look exactly like du.

I've split it into multiple lines, but it can be recombined into a one-liner.

du -h |
  awk '{printf "%s %08.2f\t%s\n", 
    index("KMG", substr($1, length($1))),
    substr($1, 0, length($1)-1), $0}' |
  sort -r | cut -f2,3

Explanation:

  • BEGIN - create a string to index to substitute 1, 2, 3 for K, M, G for grouping by units, if there's no unit (the size is less than 1K), then there's no match and a zero is returned (perfect!)
  • print the new fields - unit, value (to make the alpha-sort work properly it's zero-padded, fixed-length) and original line
  • index the last character of the size field
  • pull out the numeric portion of the size
  • sort the results, discard the extra columns

Try it without the cut command to see what it's doing.

Edit:

Here's a version which does the sorting within the AWK script and doesn't need cut (requires GNU AWK (gawk) for asorti support):

du -h0 |
   gawk 'BEGIN {RS = "\0"}
        {idx = sprintf("%s %08.2f %s", 
         index("KMG", substr($1, length($1))),
         substr($1, 0, length($1)-1), $0);
         lines[idx] = $0}
    END {c = asorti(lines, sorted);
         for (i = c; i >= 1; i--)
           print lines[sorted[i]]}'

Edit: Added null record separation in order to handle potential filenames which include newlines. Requires GNU du and gawk.

Upvotes: 7

Related Questions