CH J
CH J

Reputation: 35

How to reduce memory usage of python os.listdir for large number of files

I am using os.listdir() to get the list of a directory. I would like to find a nice way of get list of files in a directory with limited number of files if specified.

os.listdir() works nice but it does not have a way to limit number of scan and as a result sometimes it cause kill process by out-of-memory.

os.listdir(path) works well.

It generates a long-list as big as the number of files in a directory, e.g.:

100 entry if 100 files in a directory.

1000 entry if 1000 files in a directory.

It means it returns bigger list depending on the number of files in a directory. And sometimes it causes process kill by using too big memory.

But what happens if there's over one million files in a directory?

Searched os.listdir(), os.scandir(), os.walk() but they does not support limit number of files to search.

dirs = os.listdir( path )
print( len(dirs) )

So, I expect a function as like, which can save the memory.

OLD WAY

list = os.listdir(path)   # return full list of files in a path

NEW WAY

list = os.listdir(path, maxscan=100)  # return list of files max 100 files in a path.

Upvotes: 1

Views: 3395

Answers (1)

m01010011
m01010011

Reputation: 1122

Use glob.iglob and loop over the generator:

for f in glob.iglob('subdir/*'):
    print(f)

Source: Python Docs

Additionally, if you do want to process the files in batches of, say 100, you can easily modify the code above to do that:

folder_contents = glob.iglob('subdir/*')
for _, f in zip(range(100), folder_contents):
    print(f)

Upvotes: 1

Related Questions