Alfe
Alfe

Reputation: 59586

Unix du with progress

I'm doing a normal du on some huge directory. It probably takes ages as the storage also is network attached.

I would like to see the progress before the end of the process so that I can already estimate what's going on. At any given time I'd like to see the already collected sum of disk usage as du counts it. I found no option for du to provide this. Did I miss something? Is there an easy way to achieve this?

I imagined something like this:

du -ba . | { s=0; while read a b; do ((s+=a)); echo $s; done; }

This would sum up the output but of course this would sum up also the accumulated directory sizes (effectively multiplying the counted sizes). I found no option to just mention the files in the output. On the other hand, using find -type f -printf "%s %p\n" instead would count hardlinks multiple times.

Is there any typical tool to achieve what I want or a simple fix to the presented script? Currently I consider writing a Python script for this but have the feeling that might be overkill.

Upvotes: 3

Views: 3222

Answers (4)

ewcz
ewcz

Reputation: 13097

I think that in order to profit from the performance of the du utility vs. any custom script, one could just:

  1. download current core utils source from, e.g., https://ftp.gnu.org/gnu/coreutils/coreutils-8.30.tar.xz
  2. tar -xf coreutils-8.30.tar.xz && cd coreutils-8.30
  3. ./configure --prefix=/custom/location/of/modified/coreutils
  4. in ./src/du.c add after line 666 the statement print_size (&tot_dui, _("total"));

The end of the process_file function would look like:

  if ((IS_DIR_TYPE (info) && level <= max_depth)
      || (opt_all && level <= max_depth)
      || level == 0)
    {
      /* Print or elide this entry according to the --threshold option.  */
      uintmax_t v = opt_inodes ? dui_to_print.inodes : dui_to_print.size;
      if (opt_threshold < 0
          ? v <= -opt_threshold
          : v >= opt_threshold)
        print_size (&dui_to_print, file);

      print_size (&tot_dui, _("total")); /* extra statement */
    }

  return ok;
  1. make install

This would make the modified du to report the total size after each file, i.e., the output could look like:

129K    ./bin/dirname
33M total
132K    ./bin/uname
33M total
207K    ./bin/sha1sum
33M total
156K    ./bin/truncate
33M total
311K    ./bin/pr
34M total
172K    ./bin/printf
34M total
138K    ./bin/pathchk
34M total

Upvotes: 1

Hielke Walinga
Hielke Walinga

Reputation: 2845

If you can download it, ncdu is a nice program that does the same as du, but with a nice interface including how far your progress is.

On Debian, Ubuntu, etc, you can install it with

sudo apt install ncdu

Upvotes: 0

Alfe
Alfe

Reputation: 59586

I came up with a small bash one-liner to solve my issue. It's not as nice as using du properly but it give progress information and it doesn't count hardlinks twice.

I give it here in one line and spread out to make it clearer:

find -type f -printf "%s %i %p\n" | { sum=0; declare -A inodes; while read size inode path; do [ "${inodes[$inode]}" != 1 ] && { inodes[$inode]=1; ((sum+=size)); echo "$sum $size $path"; }; done; }

And the same nicely formatted:

find -type f -printf "%s %i %p\n" | {
  sum=0
  declare -A inodes
  while read size inode path
  do
    [ "${inodes[$inode]}" != 1 ] && {
      inodes[$inode]=1
      ((sum+=size))
      echo "$sum $size $path"
    }
  done
}

Upvotes: 1

Ashishkumar Singh
Ashishkumar Singh

Reputation: 3600

Maybe below command give you a hint to progress ahead

ls -laR | awk '{ total += $6;if(FNR%1000 == 0)print total;}; END { print total }'

In the awk statement, you can various condition to check if it is a directory or links.

And FNR%1000 will print the size progress every hundred line it reads. Instead of ls, you can use find

Upvotes: 0

Related Questions