AnimNations
AnimNations

Reputation: 256

Finding the maximum number of files in a subdirectory

so I'm trying to write a bash script that would look through all the subdirectories in a specified folder and would return the max number of files in a single subdirectory. Here is what I have right now:

#!/bin/bash   
maxCount=0 
fileCount=0 
# script that writes out all the directories and how many files are in each directory

find ./testdata/ -maxdepth 1 -mindepth 1 -type d | while read dir; do  #loop all subdirectories    
fileCount= find "$dir" -type f | wc -l #count all the files in subdirectory

    if [ $fileCount -gt $maxCount ] #if the count is higher than the max     
    then
        maxCount= "$fileCount" #set the count equal to the max
    fi

    done

#print out how many messages are in the thread    
echo "$maxCount"

First off, the variable fileCount is not setting properly. The output of find "$dir" -type f | wc -l is still being set to stdout and as such the script keeps returning zero.

Example of the current output:

1
1
2
1
1
1
0

Where the last zero is the output for echo "$maxCount"

Not quite sure what I'm doing wrong. Thanks!

Using xfce4 terminal

Upvotes: 4

Views: 678

Answers (3)

codeforester
codeforester

Reputation: 43039

You could do it a little more efficiently in pure Bash:

#!/bin/bash

# build a hash of directories and file counts
declare -A file_hash
while read -r -d '' file; do     # read the null delimited output of find
  dir="${file%%/*}"              # extract **top dirname** from file path
  ((file_hash[$dir]++))          # increment the count for this dir
done < <(find . -type f -print0) # find all files and output them with a null delimiter
                                 # this will gracefully handle files or directories that have new lines in their name

# find the top directory name with the biggest file count
max=0
for i in "${!file_hash[@]}"; do
  count="${file_hash[$i]}"
  ((count > max)) && { max=$count; max_dir=$i; }
done
printf 'max_dir=[%s], max_count=[%s]\n' "$max_dir" "$max"

In this approach, we do a single scan of the top level subdirectories with find. This will do well when there are large number of directories.

Upvotes: 0

armnotstrong
armnotstrong

Reputation: 9065

You could do what you want with the following command which takes advantage of the find's -exec option

find ./testdata/  -maxdepth 1 -mindepth 1 -type d -exec bash -c 'find {} -type f | wc -l' \; | sort -n | tail -n 1

And as in your approach, this line

fileCount= find "$dir" -type f | wc -l #count all the files in subdirectory

there should be no space between = and find and you should have a Command Substitution to assign the value to the variable fileCount like this:

fileCount=$(find "$dir" -type f | wc -l)

And if you want to be stick to the for loop:

find . -maxdepth 1 -mindepth 1 -type d | while read dir;do
    cnt=$(find ${dir} -type f | wc -l)
    echo ${cnt}   
done | sort -n | tail -n 1

Upvotes: 4

AnimNations
AnimNations

Reputation: 256

Correct formatting:

#!/bin/bash   
maxCount=0 
fileCount=0 
# script that writes out all the directories and how many files are in each directory

find ./testdata/ -maxdepth 1 -mindepth 1 -type d | { while read dir; do  #loop all subdirectories    
fileCount=$(find "$dir" -type f | wc -l) #count all the files in subdirectory

    if [ $fileCount -gt $maxCount ] #if the count is higher than the max     
    then
        maxCount= "$fileCount" #set the count equal to the max
    fi

    done

#print out how many messages are in the thread    
echo "$maxCount"; }

Changes:

fileCount=${find "$dir" -type f | wc -l}

Used Command Substitution to properly set fileCount variable to correct value

{ while read dir; do ... echo "$maxCount"; }

Used Command Grouping to keep maxCount in the same scope as the while loop when echoing the result.

Hope this helps others in the future!

Upvotes: 2

Related Questions