mr_man
mr_man

Reputation: 91

loop through sub directories, to sample files

The following code selects a random sample of files (in this case 50) from dir 1 and copies them to a new folder with the same name.

However, I have hundreds of folders which I need to sample from (and copy to a new folder with the same name).

How can I adjust the first part of the code so that I can loop through all sub directories, and move the samples to a new folder with the same name. (so the sample of sub dir 1 goes to dir 1, the sample of sub dir 2 goes to dir 2 etc.)

import os 
import shutil 
import random 
from shutil import copyfile

sourcedir = '/home/mrman/dataset-python/train/1/'
newdir  = '/home/mrman/dataset-python/sub-train/1'


filenames = random.sample(os.listdir(sourcedir), 50)
for i in filenames:
    shutil.copy2(sourcedir + i, newdir)

Upvotes: 4

Views: 7702

Answers (2)

mr_man
mr_man

Reputation: 91

Solution was simpler than expected (thanks to @idjaw for the tip):

import os, sys
import shutil
import random
from shutil import copyfile

#folder which contains the sub directories
source_dir = '/home/mrman/dataset-python/train/'

#list sub directories 
for root, dirs, files in os.walk(source_dir):

#iterate through them
    for i in dirs: 

        #create a new folder with the name of the iterated sub dir
        path = '/home/mrman/dataset-python/sub-train/' + "%s/" % i
        os.makedirs(path)

        #take random sample, here 3 files per sub dir
        filenames = random.sample(os.listdir('/home/mrman/dataset-python/train/' + "%s/" % i ), 3)

        #copy the files to the new destination
        for j in filenames:
            shutil.copy2('/home/mrman/dataset-python/train/' + "%s/" % i  + j, path)

Upvotes: 4

idjaw
idjaw

Reputation: 26580

You are looking to use os.walk. Check out the documentation

Run the following to get an understanding of how it works, and read the documentation to understand how this can be used for your solution. Ultimately, what will happen is that you will traverse down the entire directory structure from the path you provide, and each iteration will give you the current path you are at, all the directories in that level, and all the files.

Also, let's say you want to do an operation on a particular full path of something, then make sure you leverage os.path.join when creating your path.

your_path = "/some/path/you/want"
for path, dirs, files in os.walk(your_path):
    print(path)
    print(dirs)
    print(files)

Upvotes: 5

Related Questions