In python, how can I improve glob performance for a very large directory of files?

Question

I am using Python 3.9 to recurse through a directory of files and send the files to a multiprocessing queue. The directory has 10m+ files and it can take up to 20 minutes to build and process the initial file list. How could I improve this? Perhaps there is a way to recurse through the files without loading them into memory first?

path = "/directory-of-files"
def is_valid_file():
   #returns true if file meets conditions
   return True

files = glob.glob(path + '/**', recursive=True)
files = filter(is_valid_file, files) #Filter valid files only
results = pool.map(setMedia, files)

In python, how can I improve glob performance for a very large directory of files?

Answers (1)

Related Questions