Dario Spagnolo
Dario Spagnolo

Reputation: 191

How to stop find after a certain number of results

I am trying to make certain that a specific directory has at least 100 files of at least 1MB each. The search has to be recursive because there are many sub-directories. I cannot wait until I get a list of all >1Mb files because the directory has millions of files and it would take too long.

So I expected the following command to work :

find -size +1M | head -n 100

There are plenty of files over 1Mb in my directory so it should take only seconds before the head command returns with the list of the first 100 lines. But it takes a lot longer.

If I run find -size +1M it takes a very short time to get many results. Even more so when it's run two times in a row and the the FS cache is fresh.

So I wonder why head doesn't return as soon as the first 100 files are found.

On the other hand, if I omit the "-size" parameter, it works just fine :

find | head -n 100

This returns immediately with a list of 100 files.

I am running GNU/Linux Debian 7.4 (Wheezy) with kernel 3.2.0-4-amd64. Filesystem is ext4 on top of an LVM volume on a single RAID1 array. It has 9638853 used inodes (6%), a capacity of 2.7 Tb and 682 Gb free.

Upvotes: 2

Views: 776

Answers (1)

p4sh4
p4sh4

Reputation: 3291

That's how pipes work. It waits until output of find -size +1M fills the pipe buffer, then pipes it to head -n 100, so you are actually waiting for the find command to fill the pipe buffer which is usually 64 kilobytes.

When you omit the -size parameter, the results come out really quickly, it just takes time to print all the lines in your terminal so it is perceived as taking a longer time.

If you run find -size +1M alone, it takes a short time to get many results, but not all results. If you let it run towards completion, you will see that it will take a long time too.

You can use stdbuf to modify buffering operations for a certain command. For example,

stdbuf -oL -eL find -size +1M | head -n 10

will line-buffer stdout and stderr for your command, and the results should come out faster.

Upvotes: 3

Related Questions