Tlaloc-ES
Tlaloc-ES

Reputation: 5282

glob.glob("**/*.jpg") as an itertator or lazy load?

I want to know if is possible to use glob.glob("**/*.jpg") in order to get all images in several folders but as an iterator in order to avoid filling the memory

Currently, I am using the following code with glob:

for file in glob.glob("**/*.jpg")[:1]:
    print(file)

but I use

for model_folder in os.scandir(folder):
    for model_folder_content in os.scandir(model_folder):
        print(model_folder_content)

The problem with the first approach is that if there are a lot of files that can fill the memory and fails, so the idea is to use scandir because return an iterator, but with the option of using a pattern.

Is this possible?

Thanks

Upvotes: 2

Views: 461

Answers (3)

accdias
accdias

Reputation: 5372

You can use pathlib.Path.rglob(), which returns a generator:

>>> from pathlib import Path
>>> folder = Path('/home/accdias')
>>> jpgs = folder.rglob('*.jpg')
>>> type(jpgs)
<class 'generator'>
>>> 

Upvotes: 2

bajro
bajro

Reputation: 1250

The glob module has a dedicated method for this particular problem called iglob() which takes the same parameters as glob() and returns an iterator instead of a list.

The docs for iglob state the following:

Return an iterator which yields the same values as glob() without actually storing them all simultaneously.

In your case, the code snippet could look something like:

for file in glob.iglob("**/*.jpg"):
    # do something with the file

Upvotes: 1

buran
buran

Reputation: 14253

You can use glob.iglob():

glob.iglob(pathname, *, recursive=False)

Return an iterator which yields the same values as glob() without actually
storing them all simultaneously.

Upvotes: 5

Related Questions