Christopher Rucinski
Christopher Rucinski

Reputation: 4867

Tuple Unpacking with List Comprehension fails but works with for-loop

Summary

I have used a semi-complex regex to retrieve data from a website. The issue I have is that I have to do some post-processing of the matched dataset.

I have gotten the data processes to probably 95+% of where I want it, however, I am getting this simple error message that I cannot reason about; it's strange.

I can bypass it, but that is besides the point. I am trying to figure out if this is a bug or something I am overlooking fundementally with my tuple-unpacking

Background Info

One thing I have to overcome is that I get 4 matches for every "true match". That means that my data for 1 single item is spread out over 4 matches.

In simple graphical form (slighty oversimplified):

index |  a    b    c    d    e    f    g    h    i    j 
--------------------------------------------------------
   1: | ( ), ( ), ( ), ( ), ( ), (█), ( ), ( ), ( ), ( )
   2: | (█), (█), (█), (█), ( ), ( ), ( ), ( ), ( ), ( )
   3: | ( ), ( ), ( ), ( ), (█), ( ), ( ), ( ), ( ), ( )
   4: | ( ), ( ), ( ), ( ), ( ), ( ), (█), (█), (█), (█)

   5: | ( ), ( ), ( ), ( ), ( ), (▒), ( ), ( ), ( ), ( )
   6: | (▒), (▒), (▒), (▒), ( ), ( ), ( ), ( ), ( ), ( )
   7: | ( ), ( ), ( ), ( ), (▒), ( ), ( ), ( ), ( ), ( )
   8: | ( ), ( ), ( ), ( ), ( ), ( ), (▒), (▒), (▒), (▒)

   9: | ...
        ...
 615: | ...

I can get all the data, but I want to compact it, like so...

index |  a    b    c    d    e    f    g    h    i    j 
--------------------------------------------------------
   1: | (█), (█), (█), (█), (█), (█), (█), (█), (█), (█)
   2: | (▒), (▒), (▒), (▒), (▒), (▒), (▒), (▒), (▒), (▒)

   3: | ...
        ...
 154: | ...

Code

Works

Take note of the varibles abcd, e, f, and ghij and how I have to unpack them in the for-loop at the bottom

matches = [('', '', '', '', '', '', '', '', '', ''), ('Android Studio 3.6 Beta 1', '3.6', 'Beta', '1', '', '', '', '', '', ''), ('', '', '', '', 'October 10, 2019', '', '', '', '', ''), ('', '', '', '', '', '', 'https://dl.google.com/dl/android/studio/ide-zips/3.6.0.13/android-studio-ide-192.5916306-linux.tar.gz', '3.6.0', '13', '192'), ('', '', '', '', '', 'stable', '', '', '', ''), ('Android Studio 3.5.1', '3.5.1', '', '', '', '', '', '', '', ''), ('', '', '', '', 'October 2, 2019', '', '', '', '', ''), ('', '', '', '', '', '', 'https://dl.google.com/dl/android/studio/ide-zips/3.5.1.0/android-studio-ide-191.5900203-linux.tar.gz', '3.5.1', '0', '191'), ('', '', '', '', '', '', '', '', '', ''), ('Android Studio 3.6 Canary 12', '3.6', 'Canary', '12', '', '', '', '', '', ''), ('', '', '', '', 'September 18, 2019', '', '', '', '', ''), ('', '', '', '', '', '', 'https://dl.google.com/dl/android/studio/ide-zips/3.6.0.12/android-studio-ide-192.5871855-linux.tar.gz', '3.6.0', '12', '192')]

f = [
    f
    for index, (_, _, _, _, _, f, *_)
    in enumerate(matches)
    if index % 4 == 0
]
abcd = [
    (a, b, c, d)
    for index, (a, b, c, d, *_)
    in enumerate(matches)
    if index % 4 == 1
]
e = [
    e
    for index, (_, _, _, _, e, *_)
    in enumerate(matches)
    if index % 4 == 2
]
ghij = [
    (g, h, i, j)
    for index, (*_, g, h, i, j)
    in enumerate(matches)
    if index % 4 == 3
]

abcdefghij = zip(abcd, e, f, ghij)

for (a, b, c, d), e, f, (g, h, i, j) in abcdefghij:
    print("a", a, "\nb", b, "\nc", c, "\nd", d, "\ne", e, "\nf", f, "\ng", g, "\nh", h, "\ni", i, "\nj", j, "\n", "-" * 100)

#

Fails

Take note that I am trying to unpack the same tuples right away with the varibles a, b, c, d, e, f, g, h, i, and j

matches = [('', '', '', '', '', '', '', '', '', ''), ('Android Studio 3.6 Beta 1', '3.6', 'Beta', '1', '', '', '', '', '', ''), ('', '', '', '', 'October 10, 2019', '', '', '', '', ''), ('', '', '', '', '', '', 'https://dl.google.com/dl/android/studio/ide-zips/3.6.0.13/android-studio-ide-192.5916306-linux.tar.gz', '3.6.0', '13', '192'), ('', '', '', '', '', 'stable', '', '', '', ''), ('Android Studio 3.5.1', '3.5.1', '', '', '', '', '', '', '', ''), ('', '', '', '', 'October 2, 2019', '', '', '', '', ''), ('', '', '', '', '', '', 'https://dl.google.com/dl/android/studio/ide-zips/3.5.1.0/android-studio-ide-191.5900203-linux.tar.gz', '3.5.1', '0', '191'), ('', '', '', '', '', '', '', '', '', ''), ('Android Studio 3.6 Canary 12', '3.6', 'Canary', '12', '', '', '', '', '', ''), ('', '', '', '', 'September 18, 2019', '', '', '', '', ''), ('', '', '', '', '', '', 'https://dl.google.com/dl/android/studio/ide-zips/3.6.0.12/android-studio-ide-192.5871855-linux.tar.gz', '3.6.0', '12', '192')]

f = [
    f
    if f == "stable" else "preview"
    for index, (_, _, _, _, _, f, *_)
    in enumerate(matches)
    if index % 4 == 0
]
a, b, c, d = [
    (a, b, c, d)
    for index, (a, b, c, d, *_)
    in enumerate(matches)
    if index % 4 == 1
]
e = [
    e
    for index, (_, _, _, _, e, *_)
    in enumerate(matches)
    if index % 4 == 2
]
g, h, i, j = [
    (g, h, i, j)
    for index, (*_, g, h, i, j)
    in enumerate(matches)
    if index % 4 == 3]

abcdefghij = zip(a, b, c, d, e, f, g, h, i, j)

for a, b, c, d, e, f, g, h, i, j in abcdefghij:
    print("a", a, "\nb", b, "\nc", c, "\nd", d, "\ne", e, "\nf", f, "\ng", g, "\nh", h, "\ni", i, "\nj", j, "\n", "-" * 100)

#

With this code, I get the following error message...

... a, b, c, d = [(a, b, c, d) for index, (a, b, c, d, *_) in enumerate(matches) if index % 4 == 1]`
ValueError: too many values to unpack (expected 4)`

Expectations

I would have expected these two methods to do the exact same logic and the end results should be exactly the same.

They are not! Why?

Upvotes: 2

Views: 437

Answers (3)

Christopher Rucinski
Christopher Rucinski

Reputation: 4867

Solution

When a list comprehension creates a list of tuples, and you want to unpack those tuples, then you need to do the following with zip(*...)

x, y, z = zip(*list_comprehension)

# To be more clear
x, y, z = zip(*[(i, j, k) for (i, j, k) in tuple_list])
# For my code, this change must be made this code
a, b, c, d = zip(*[
    (a, b, c, d)
    for index, (a, b, c, d, *_)
    in enumerate(matches)
    if index % 4 == 1
])

...

# And this code
g, h, i, j = zip(*[
    (g, h, i, j)
    for index, (*_, g, h, i, j)
    in enumerate(matches)
    if index % 4 == 3
])

Why

Let's take a look at the following code.

matches = [
    ("a1", "b1", "c1", "d1", "e1"),
    ("a2", "b2", "c2", "d2", "e2"),
    ("a3", "b3", "c3", "d3", "e3"),
    ("a4", "b4", "c4", "d4", "e4"),
    ("a5", "b5", "c5", "d5", "e5")
]

# I want a tuple of a's, b's, and c's
abc = [
    (a, b, c)
    for (a, b, c, *_)  # Ignore elements `d` and `e`
    in matches
]

print("abc =", abc)
# abc = [('a1', 'b1', 'c1'), ('a2', 'b2', 'c2'), ('a3', 'b3', 'c3'), ('a4', 'b4', 'c4'), ('a5', 'b5', 'c5')]
# NOTE: This is a list of tuples of ones, twos, threes, fours, and fives
#       Not a's, b's, and c's!!

# I want a list of e's
e = [
    e
    for (*_, e) 
    in matches
]

print("e =", e)
# e = ['e1', 'e2', 'e3', 'e4', 'e5']
# NOTE: This is a list of e's

The fact that with abc is that I get a list of one's, two's, three's, four's, and five's and not a's, b's and c's.

Deep Dive

The reason for the error message ValueError: too many values to unpack is because you have too many or too few tuples in your list of tuples to unpack.

Remember, you have a list of one's, two's, three's, four's, and five's (5 elements per tuple) and not a's, b's and c's (3 elements per tuple)

So this will always fail

a, b, c = [
    (a, b, c)
    for (a, b, c, *_) 
    in matches
]

# ERROR
#    Traceback (most recent call last):
#      File "...*.py", line 11, in <module>
#        for (a, b, c, *_) in matches
#    ValueError: too many values to unpack (expected 3)

You are trying to put these values [('a1', 'b1', 'c1'), ('a2', 'b2', 'c2'), ('a3', 'b3', 'c3'), ('a4', 'b4', 'c4'), ('a5', 'b5', 'c5')] into 3 tuples. You can't! You need 5 tuples inside and outside the list comprehension

But this will succeed. It will be wrong. But it won't cause an error.

# This will assign 5 variables with the tuples (a, b, c) from the original tuples (a, b, c, d, e)
ones, twos, threes, fours, fives = [
    (a, b, c)
    for (a, b, c, *_) in matches
]

print("ones =", ones)
print("twos =", twos)
print("threes =", threes)
print("fours =", fours)
print("fives =", fives)

# Output
# ones = ('a1', 'b1', 'c1')
# twos = ('a2', 'b2', 'c2')
# threes = ('a3', 'b3', 'c3')
# fours = ('a4', 'b4', 'c4')
# fives = ('a5', 'b5', 'c5')

Remeber that we want something like ('a1', 'a2', 'a3', 'a4', 'a5'), not ('a1', 'b1', 'c1')

And if the tuples were of size 20, then you would need to have ...sixs, sevens, .... , nineteens, twenties = [ ... ]

First Try

Well, we want all the 1st elements from each tuple to go together. Same for the 2nd and 3rd. So zip(...) seems like a good candidate. Let's look at the results.

result = list(zip(abc))
print(result)

# list(zip(abc)) = [(('a1', 'b1', 'c1'),), (('a2', 'b2', 'c2'),), (('a3', 'b3', 'c3'),), (('a4', 'b4', 'c4'),), (('a5', 'b5', 'c5'),)]

# Let's look at what one element looks like
print(result[0])
# result[0] = (('a1', 'b1', 'c1'),)

This is wrong!

As you can see, there are a few things one.

  1. Weird tuple structure! Tuples inside of tuples. When you zip a list of tuples. This is the result.
  2. Wrong elements in each tuple! We got a list of ones not a list of a

Second Try

Well, zip doesn't work on a list of tuples (as is). We have to do something to the list of tuples first

Let's look at this...

abc = [(a, b, c) for (a, b, c, *_) in matches]

print(abc)
# abc = [('a1', 'b1', 'c1'), ('a2', 'b2', 'c2'), ('a3', 'b3', 'c3'), ('a4', 'b4', 'c4'), ('a5', 'b5', 'c5')]
# Again, we cannot zip these

print(*abc)
# *abc = ('a1', 'b1', 'c1') ('a2', 'b2', 'c2') ('a3', 'b3', 'c3') ('a4', 'b4', 'c4') ('a5', 'b5', 'c5')
# Wait, here we have a sequence of tuples. Not a list of tuples. Just tuple after tuple after tuple.

# What happens when we zip this "sequence" of tuples?
print(list(zip(*abc)))
# list(zip(*abc)) = [('a1', 'a2', 'a3', 'a4', 'a5'), ('b1', 'b2', 'b3', 'b4', 'b5'), ('c1', 'c2', 'c3', 'c4', 'c5')]

# Great, so let's try this
a, b, c = zip(*abc)

That's what we want!!

Therefore

Since we can do the following.

a, b, c, d = zip(*abcd)

print("a =", a)
print("b =", b)
print("c =", c)

# Output
# a = ('a1', 'a2', 'a3', 'a4', 'a5')
# b = ('b1', 'b2', 'b3', 'b4', 'b5')
# c = ('c1', 'c2', 'c3', 'c4', 'c5')

That means we can do this...

a, b, c, d = zip(*[
    (a, b, c, d)
    for index, (a, b, c, d, *_)
    in enumerate(matches)
])

Upvotes: 0

wwii
wwii

Reputation: 23783

@PaulPanzer That appears to work. I will have to verify that everything lines up correctly. But why do I need that?

Say q is an iterable for which (?) your comprehension produces a list with 26 tuples, and each tuple has 4 items.

z = [(a,b,c,d) for i, (a,b,c,d,*e) in enumerate(q)]


In [6]: len(z)
Out[6]: 26

In [7]: len(z[0])
Out[7]: 4

In [17]: z[:3]
Out[17]: [('a', 'a', 'a', 'a'), ('b', 'b', 'b', 'b'), ('c', 'c', 'c', 'c')]

When you try to unpack you are trying to stuff 26 items into four names/variables

In [8]: a,b,c,d = z
Traceback (most recent call last):

  File "<ipython-input-8-64277b78f273>", line 1, in <module>
    a,b,c,d = z

ValueError: too many values to unpack (expected 4)

zip(*list_of_4_item_tuples) will transpose the list_of_4_item_tuples to 4 tuples with 26 items each

In [9]: 

In [9]: a,b,c,d = zip(*z)    # z is the result of the list comprehension shown above

In [11]: len(a),len(b),len(c),len(d)
Out[11]: (26, 26, 26, 26)

Test stuff

import string
a = string.ascii_lowercase
b = string.ascii_lowercase
c = string.ascii_lowercase
d = string.ascii_lowercase
e = string.ascii_lowercase
f = string.ascii_lowercase
q = zip (a,b,c,d,e,f)

Upvotes: 2

Corentin Pane
Corentin Pane

Reputation: 4943

Your list [(a, b, c, d) for index, (a, b, c, d, *_) in enumerate(matches) if index % 4 == 1] doesn't have excatly 4 elements, meaning that trying to unpack it using only four variables fails.

Upvotes: 0

Related Questions