sjsam
sjsam

Reputation: 21965

Counting total files in directory - find vs ls

Is there a reason why

find . -mindepth 1 -maxdepth 1 | wc -l

is suggested against

ls -1 | wc -l

(or vice-versa ?)

to count the total number of files/directories inside a folder

Notes:

  1. This question is more concerned with just counting stuff.
  2. There are no files with leading .
  3. There may be non-standard files with say a \n in it.

Upvotes: 5

Views: 1383

Answers (3)

Marco Carlo Moriggi
Marco Carlo Moriggi

Reputation: 414

May I add some more?

Reasons to use find instead of ls

As stated by mogsie the main reason is about performance:

  • ls has no options but sorting (default by name), so it must wait for the whole list to be returned by the OS, and then sort it, before printing it in the standard output
  • find, on the other hand, has no sorting capabilities, so nodes are evaluated directly when the OS returns the buffer of nodes, potentially before getting the whole list, and does not need to sort them.

Effective solution

disclosure: I used this solution in production to count entries of a directory with about 300k items

find . -mindepth 1 -maxdepth 1 -printf '.' | wc -m 

Basically this prints a dot in the standard output for every fs-entry, then counts the printed characters.

The big advantage on file names is easy to imagine: they are never used; and the other advantage on performance is that no attribute is required to count the files (as you woulod expect from a function that count files in a directory), unless you specify some filter.

If you want to make it start counting and then eventually get back and see how many items have been found, you can also redirect the standard output to a file (eventually in a tmpfs, so you never have to write on disk), then detach the shell and eventually get back and count the characters in the file:

nohup find . -mindepth 1 -maxdepth 1 -printf '.' > /tmp/count.txt &

Then simply counting the dots in the file will give you the current count

wc -m /tmp/count.txt

... and if you are eager to get the current counter's updates

watch wc -m /tmp/count.txt

Upvotes: 2

mogsie
mogsie

Reputation: 4156

The reason find(1) is preferred to ls(1) is that

  • ls defaults to sorting the list of files
  • find has no sorting capability

Sorting can be extremely memory consuming for large data sets. So even though you can use ls -f or ls -U to disable sorting, I find that using find is safer because I know that the directory listing won't be sorted, no matter what options are passed to it.

In any case, telling the command to print less about each file can help in performance and correctness. Performance because the command can avoid the stat(2) call and correctness because if you e.g. only print the inode, you'll be certain that the name of the file won't affect the output (e.g. line breaks, carriage returns or other odd characters.)

Upvotes: 2

larsks
larsks

Reputation: 311586

The first command...

find . -mindepth 1 -maxdepth 1 | wc -l

...will list files and directories that start with ., while your ls command will not. The equivalent ls command would be:

ls -A | wc -l

Both will give you the same answers. As folks pointed out in the comments, both of these will give you wrong answers if there are entries that contained embedded newlines, because the above commands are simply counting the number of lines of output.

Here's one way to count the number of files that is independent of filename quirks:

find . -mindepth 1 -maxdepth 1 -print0 | xargs -0i echo | wc -l

This passes the filenames to xargs with a NUL terminator, rather than relying on newlines, and then xargs simply prints a blank line for each file, and we count the number of lines of output from xargs.

Upvotes: 5

Related Questions