Reputation: 834
I get different output when using a list comprehension versus a generator comprehension. Is this expected behavior or a bug?
Consider the following setup:
all_configs = [
{'a': 1, 'b':3},
{'a': 2, 'b':2}
]
unique_keys = ['a','b']
If I then run the following code, I get:
print(list(zip(*( [c[k] for k in unique_keys] for c in all_configs))))
>>> [(1, 2), (3, 2)]
# note the ( vs [
print(list(zip(*( (c[k] for k in unique_keys) for c in all_configs))))
>>> [(2, 2), (2, 2)]
This is on python 3.6.0:
Python 3.6.0 (default, Dec 24 2016, 08:01:42)
[GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.42.1)] on darwin
Upvotes: 26
Views: 2306
Reputation: 140178
To see what's going on, replace c[k]
with a function with a side effect:
def f(c,k):
print(c,k)
return c[k]
print("listcomp")
print(list(zip(*( [f(c,k) for k in unique_keys] for c in all_configs))))
print("gencomp")
print(list(zip(*( (f(c,k) for k in unique_keys) for c in all_configs))))
output:
listcomp
{'a': 1, 'b': 3} a
{'a': 1, 'b': 3} b
{'a': 2, 'b': 2} a
{'a': 2, 'b': 2} b
[(1, 2), (3, 2)]
gencomp
{'a': 2, 'b': 2} a
{'a': 2, 'b': 2} a
{'a': 2, 'b': 2} b
{'a': 2, 'b': 2} b
[(2, 2), (2, 2)]
c
in generator expressions is evaluated after the outer loop has completed:
c
bears the last value it took in the outer loop.
In the list comprehension case, c
is evaluated at once.
(note that aabb
vs abab
too because of execution when zipping vs execution at once)
note that you can keep the "generator" way of doing it (not creating the temporary list) by passing c
to map
so the current value is stored:
print(list(zip(*( map(c.get,unique_keys) for c in all_configs))))
in Python 3, map
does not create a list
, but the result is still OK: [(1, 2), (3, 2)]
Upvotes: 12
Reputation: 250951
This is happening because zip(*)
call resulted in evaluation of the outer generator and this outer returned two more generators.
(c[k], print(c)) for k in unique_keys)
The evaluation of outer generator moved c
to the second dict: {'a': 2, 'b':2}
.
Now when we are evaluating these generators individually they look for c
somewhere, and as its value is now {'a': 2, 'b':2}
you get the output as [(2, 2), (2, 2)]
.
Demo:
>>> def my_zip(*args):
... print(args)
... for arg in args:
... print (list(arg))
...
... my_zip(*((c[k] for k in unique_keys) for c in all_configs))
...
Output:
# We have two generators now, means it has looped through `all_configs`.
(<generator object <genexpr>.<genexpr> at 0x104415c50>, <generator object <genexpr>.<genexpr> at 0x10416b1a8>)
[2, 2]
[2, 2]
The list-comprehension on the other hand evaluates right away and can fetch the value of current value of c
not its last value.
c
?Use a inner function and generator function. The inner function can help us remember c
's value using default argument.
>>> def solve():
... for c in all_configs:
... def func(c=c):
... return (c[k] for k in unique_keys)
... yield func()
...
>>>
>>> list(zip(*solve()))
[(1, 2), (3, 2)]
Upvotes: 6
Reputation: 49318
In a list comprehension, expressions are evaluated eagerly. In a generator expression, they are only looked up as needed.
Thus, as the generator expression iterates over for c in all_configs
, it refers to c[k]
but only looks up c
after the loop is done, so it only uses the latest value for both tuples. By contrast, the list comprehension is evaluated immediately, so it creates a tuple with the first value of c
and another tuple with the second value of c
.
Consider this small example:
>>> r = range(3)
>>> i = 0
>>> a = [i for _ in r]
>>> b = (i for _ in r)
>>> i = 3
>>> print(*a)
0 0 0
>>> print(*b)
3 3 3
When creating a
, the interpreter created that list immediately, looking up the value of i
as soon as it was evaluated. When creating b
, the interpreter just set up that generator and didn't actually iterate over it and look up the value of i
. The print
calls told the interpreter to evaluate those objects. a
already existed as a full list in memory with the old value of i
, but b
was evaluated at that point, and when it looked up the value of i
, it found the new value.
Upvotes: 37
Reputation: 209
Both are generator object. The first one is just a generator and the second a generator in a generator
print list( [c[k] for k in unique_keys] for c in all_configs)
[[1, 3], [2, 2]]
print list( (c[k] for k in unique_keys) for c in all_configs)
[<generator object <genexpr> at 0x000000000364A750>, <generator object <genexpr> at 0x000000000364A798>]
When you use zip(* in the first expression nothing happens because it is one generator that will return the list same as list() would do. So it returns the output you would expect. The second time it zips the generators creating a list with the first generator and a list with the second generator. Those generators on there own have a differnt result then the generator of the first expression.
This would be the list compression:
print [c[k] for k in unique_keys for c in all_configs]
[1, 2, 3, 2]
Upvotes: -1