physlexic
physlexic

Reputation: 858

Pulling random files out of a folder for sampling

I needed a way to pull 10% of the files in a folder, at random, for sampling after every "run." Luckily, my current files are numbered numerically, and sequentially. So my current method is to list file names, parse the numerical portion, pull max and min values, count the number of files and multiply by .1, then use random.sample to get a "random [10%] sample." I also write these names to a .txt then use shutil.copy to move the actual files.

Obviously, this does not work if I have an outlier, i.e. if I have a file 345.txt among other files from 513.txt - 678.txt. I was wondering if there was a direct way to simply pull a number of files from a folder, randomly? I have looked it up and cannot find a better method.

Thanks.

Upvotes: 6

Views: 11756

Answers (5)

Peter
Peter

Reputation: 2361

Based on Karl's solution (which did not work for me under Win 10, Python 3.x), I came up with this:

import numpy as np
import os

# List all files in dir
files = os.listdir("C:/Users/.../Myfiles")

# Select 0.5 of the files randomly 
random_files = np.random.choice(files, int(len(files)*.5))

# Get the remaining files
other_files = [x for x in files if x not in random_files]

# Do something with the files
for x in random_files:
    print(x)

Upvotes: 0

physlexic
physlexic

Reputation: 858

I was unable to get the other methods to work easily with my code, but I came up with this.

output_folder = 'C:/path/to/folder'
for x in range(int(len(files) *.1)):
    to_copy = choice(files)
    shutil.copy(os.path.join(subdir, to_copy), output_folder)            

Upvotes: 2

Karl Anka
Karl Anka

Reputation: 2849

Using numpy.random.choice(array, N) you can select N items at random from an array.

import numpy as np
import os

# list all files in dir
files = [f for f in os.listdir('.') if os.path.isfile(f)]

# select 0.1 of the files randomly 
random_files = np.random.choice(files, int(len(files)*.1))

Upvotes: 11

Alex Bodnya
Alex Bodnya

Reputation: 118

You can use following strategy:

  1. Use list = os.listdir(path) to get all your files in the directory as list of paths.
  2. Next, count your files with range = len(list) function.
  3. Using rangenumber you can get random item number like that random_position = random.randrange(1, range)
  4. Repeat step 3 and save values in a list until you get enough positions (range/10 in your case)
  5. After that you can get required files names like that list[random_position]

Use cycle for for iterating.

Hope this helps!

Upvotes: 0

Samuel Muiruri
Samuel Muiruri

Reputation: 522

This will give you the list of names in the folder with mypath being the path to the folder.

from os import listdir
from os.path import isfile, join
from random import shuffle
onlyfiles = [f for f in listdir(mypath) if isfile(join(mypath, f))]
shuffled = shuffle(onlyfiles)
small_list = shuffled[:len(shuffled)/10]

This should work

Upvotes: 2

Related Questions