Reputation: 939

bash: recursively find the subdirectory which contains the largest number of immediate child files

I'm looking for a shell command to run in Bash which finds the subdirectory which contains the largest number of files. Execution time isn't a huge concern; it's clear that there will need to be a big trawl/sort operation to determine this result. The question is, how to compute this?

My first thought was to use a command of the form find -type d -exec find {} -maxdepth 1 -type f | wc -l, but it turns out that you can't pipe within a find command like that.

Upvotes: 0

Answers (2)

ghoti

Reputation: 46856

So ... A find based option could work, and you can still pipe as long as what you exec is a shell.

For example, perhaps something like this, to get a list:

find /path -type d -exec sh -c 'find "$0" -maxdepth 1 -type f | wc -l' {} \; -print | paste - -

But .. I'd probably do this in pure bash:

shopt -s globstar nullglob

for d in **/; do
  printf '%s\t%s\n' $( cd "$d"; a=(*); b=(*/); echo $((${#a[@]}-${#b[@]})) ) "$d"
done

In both of these cases, the result can be sorted numerically and trimmed with a pipe:

  | sort -nr | head -1

or if you're sensitive to too many pipes, with a tiny awk script:

  | awk '$1>n{n=$1;line=$0} END {print line}'

I'm not sure which of these is simpler, find or bash. I would expect that the find solution will run faster, but I'd love to hear of your results with each.

Note that globstar requires bash 4 or above.

Upvotes: 2

coriolinus

Reputation: 939

It turns out that the way to do this is with a bash script. This should produce the intended results:

(
    for path in $(find . -type d) ; do 
        # assigning the output to a variable strips the newline
        files=$(find "$path" -maxdepth 1 -type f | wc -l) ; 
        echo $files $path ; 
    done
) | sort -rg | head

Upvotes: 0

bash: recursively find the subdirectory which contains the largest number of immediate child files

Answers (2)

Related Questions