huangbiubiu
huangbiubiu

Reputation: 1271

itertools.chain return a unexpected iterator

To my understand, Python itertools.chain is designed to chain several iterators.

When the first generator contains ['a/a.jpg', 'a/b.jpg'] and the second generator is an empty generator, the expected output is ['a/a.jpg', 'a/b.jpg'].

But the code below gives me a confusing result ['a/b/a.jpg', 'a/b/b.jpg']:

import itertools
import os

jpeg_paths = iter([])
# jpeg_paths = []

walk = [("a", ["a.jpg", "b.jpg"]), ("a/b", ["a.txt"])]

for dirpath, filenames in walk:
    # select image files
    jpg_filenames = filter(lambda name: str.endswith(name, "jpg"), filenames)
    # generate absolute path
    image_fullpath = map(lambda name: os.path.join(dirpath, name), jpg_filenames)

    jpeg_paths = itertools.chain(jpeg_paths, image_fullpath)
    # jpeg_paths += image_fullpath

a = list(jpeg_paths)
print(a)

Upvotes: 0

Views: 304

Answers (1)

han solo
han solo

Reputation: 6590

The reason was the iterable is executed with the last dirpath which is the a/b. Not the the itertools always returns iterator, it won't be executed until it is iterated over.

So to associate the dirpath with each iteration in the for loop, we could use a simple function like mapfunc. So the resulting code will be like,

import itertools
import os

jpeg_paths = []

walk = [("a", ["a.jpg", "b.jpg"]), ("a/b", ["a.txt"])]

def mapfunc(filenames, dirpath=None): # `dirpath` will be associated with each function object
    return map(lambda name: os.path.join(dirpath, name), filenames)


for dirpath, filenames in walk:
    # select image files
    jpg_filenames = filter(lambda name: name.endswith("jpg"), filenames)
    # generate absolute path
    #break
    image_fullpath = mapfunc(jpg_filenames, dirpath=dirpath) # associate the `dirpath` to each `function` object
    jpeg_paths = itertools.chain(jpeg_paths, image_fullpath)

print(list(jpeg_paths))

or you could, exhausts the iterator on each iteration like,

image_fullpath = tuple(map(lambda name: os.path.join(dirpath, name), jpg_filenames))

So it will associate the dirpath that moment to the jpg_filenames call. But that will keep all the objects in memory, and if the thing you want to walk is quite large, it is not a good idea :)

Upvotes: 2

Related Questions