srcerer
srcerer

Reputation: 1098

Quickly list random set of files in directory in Linux

Question: I am looking for a performant, concise way to list N randomly selected files in a Linux directory using only Bash. The files must be randomly selected from different subdirectories.

Why I'm asking: In Linux, I often want to test a random selection of files in a directory for some property. The directories contain 1000's of files, so I only want to test a small number of them, but I want to take them from different subdirectories in the directory of interest.

The following returns the paths of 50 "randomly"-selected files:

find /dir/of/interest/ -type f | sort -R | head -n 50

The directory contains many files, and resides on a mounted file system with slow read times (accessed through ssh), so the command can take many minutes. I believe the issue is that the first find command finds every file (slow), and only then prints a random selection.

Upvotes: 0

Views: 625

Answers (3)

Paul Hodges
Paul Hodges

Reputation: 15313

How often do you need it? Do the work periodically in advance to have it quickly available when you need it.

Create a refreshList script.

#! /bin/env bash

find /dir/of/interest/ -type f | sort -R | head -n 50 >/tmp/rand.list
mv -f /tmp/rand.list ~

Put it in your crontab.

0 7-20 * * 1-5 nice -25 ~/refresh

Then you will always have a ~/rand.list that's under an hour old.

If you don't want to use cron and aren't too picky about how old it is, just write a function that refreshes the file after you use it every time.

randFiles() {
  cat ~/rand.list
  {  find /dir/of/interest/ -type f |
       sort -R | head -n 50 >/tmp/rand.list
      mv -f /tmp/rand.list ~
  } &
}

Upvotes: 2

Kyle Banerjee
Kyle Banerjee

Reputation: 2794

If you can't run locate and the find command is too slow, is there any reason this has to be done in real time?

Would it be possible to use cron to dump the output of the find command into a file and then do the random pick out of there?

Upvotes: 0

James Brown
James Brown

Reputation: 37414

If you are using locate and updatedb updates regularly (daily is probably the default), you could:

$ locate /home/james/test | sort -R | head -5
/home/james/test/10kfiles/out_708.txt
/home/james/test/10kfiles/out_9637.txt
/home/james/test/compr/bar
/home/james/test/10kfiles/out_3788.txt
/home/james/test/test

Upvotes: 2

Related Questions