Reputation: 4867
I have used a semi-complex regex to retrieve data from a website. The issue I have is that I have to do some post-processing of the matched dataset.
I have gotten the data processes to probably 95+% of where I want it, however, I am getting this simple error message that I cannot reason about; it's strange.
I can bypass it, but that is besides the point. I am trying to figure out if this is a bug or something I am overlooking fundementally with my tuple-unpacking
One thing I have to overcome is that I get 4 matches for every "true match". That means that my data for 1 single item is spread out over 4 matches.
In simple graphical form (slighty oversimplified):
index | a b c d e f g h i j
--------------------------------------------------------
1: | ( ), ( ), ( ), ( ), ( ), (█), ( ), ( ), ( ), ( )
2: | (█), (█), (█), (█), ( ), ( ), ( ), ( ), ( ), ( )
3: | ( ), ( ), ( ), ( ), (█), ( ), ( ), ( ), ( ), ( )
4: | ( ), ( ), ( ), ( ), ( ), ( ), (█), (█), (█), (█)
5: | ( ), ( ), ( ), ( ), ( ), (▒), ( ), ( ), ( ), ( )
6: | (▒), (▒), (▒), (▒), ( ), ( ), ( ), ( ), ( ), ( )
7: | ( ), ( ), ( ), ( ), (▒), ( ), ( ), ( ), ( ), ( )
8: | ( ), ( ), ( ), ( ), ( ), ( ), (▒), (▒), (▒), (▒)
9: | ...
...
615: | ...
I can get all the data, but I want to compact it, like so...
index | a b c d e f g h i j
--------------------------------------------------------
1: | (█), (█), (█), (█), (█), (█), (█), (█), (█), (█)
2: | (▒), (▒), (▒), (▒), (▒), (▒), (▒), (▒), (▒), (▒)
3: | ...
...
154: | ...
Take note of the varibles abcd
, e
, f
, and ghij
and how I have to unpack them in the for-loop
at the bottom
matches = [('', '', '', '', '', '', '', '', '', ''), ('Android Studio 3.6 Beta 1', '3.6', 'Beta', '1', '', '', '', '', '', ''), ('', '', '', '', 'October 10, 2019', '', '', '', '', ''), ('', '', '', '', '', '', 'https://dl.google.com/dl/android/studio/ide-zips/3.6.0.13/android-studio-ide-192.5916306-linux.tar.gz', '3.6.0', '13', '192'), ('', '', '', '', '', 'stable', '', '', '', ''), ('Android Studio 3.5.1', '3.5.1', '', '', '', '', '', '', '', ''), ('', '', '', '', 'October 2, 2019', '', '', '', '', ''), ('', '', '', '', '', '', 'https://dl.google.com/dl/android/studio/ide-zips/3.5.1.0/android-studio-ide-191.5900203-linux.tar.gz', '3.5.1', '0', '191'), ('', '', '', '', '', '', '', '', '', ''), ('Android Studio 3.6 Canary 12', '3.6', 'Canary', '12', '', '', '', '', '', ''), ('', '', '', '', 'September 18, 2019', '', '', '', '', ''), ('', '', '', '', '', '', 'https://dl.google.com/dl/android/studio/ide-zips/3.6.0.12/android-studio-ide-192.5871855-linux.tar.gz', '3.6.0', '12', '192')]
f = [
f
for index, (_, _, _, _, _, f, *_)
in enumerate(matches)
if index % 4 == 0
]
abcd = [
(a, b, c, d)
for index, (a, b, c, d, *_)
in enumerate(matches)
if index % 4 == 1
]
e = [
e
for index, (_, _, _, _, e, *_)
in enumerate(matches)
if index % 4 == 2
]
ghij = [
(g, h, i, j)
for index, (*_, g, h, i, j)
in enumerate(matches)
if index % 4 == 3
]
abcdefghij = zip(abcd, e, f, ghij)
for (a, b, c, d), e, f, (g, h, i, j) in abcdefghij:
print("a", a, "\nb", b, "\nc", c, "\nd", d, "\ne", e, "\nf", f, "\ng", g, "\nh", h, "\ni", i, "\nj", j, "\n", "-" * 100)
#
Take note that I am trying to unpack the same tuples right away with the varibles a
, b
, c
, d
, e
, f
, g
, h
, i
, and j
matches = [('', '', '', '', '', '', '', '', '', ''), ('Android Studio 3.6 Beta 1', '3.6', 'Beta', '1', '', '', '', '', '', ''), ('', '', '', '', 'October 10, 2019', '', '', '', '', ''), ('', '', '', '', '', '', 'https://dl.google.com/dl/android/studio/ide-zips/3.6.0.13/android-studio-ide-192.5916306-linux.tar.gz', '3.6.0', '13', '192'), ('', '', '', '', '', 'stable', '', '', '', ''), ('Android Studio 3.5.1', '3.5.1', '', '', '', '', '', '', '', ''), ('', '', '', '', 'October 2, 2019', '', '', '', '', ''), ('', '', '', '', '', '', 'https://dl.google.com/dl/android/studio/ide-zips/3.5.1.0/android-studio-ide-191.5900203-linux.tar.gz', '3.5.1', '0', '191'), ('', '', '', '', '', '', '', '', '', ''), ('Android Studio 3.6 Canary 12', '3.6', 'Canary', '12', '', '', '', '', '', ''), ('', '', '', '', 'September 18, 2019', '', '', '', '', ''), ('', '', '', '', '', '', 'https://dl.google.com/dl/android/studio/ide-zips/3.6.0.12/android-studio-ide-192.5871855-linux.tar.gz', '3.6.0', '12', '192')]
f = [
f
if f == "stable" else "preview"
for index, (_, _, _, _, _, f, *_)
in enumerate(matches)
if index % 4 == 0
]
a, b, c, d = [
(a, b, c, d)
for index, (a, b, c, d, *_)
in enumerate(matches)
if index % 4 == 1
]
e = [
e
for index, (_, _, _, _, e, *_)
in enumerate(matches)
if index % 4 == 2
]
g, h, i, j = [
(g, h, i, j)
for index, (*_, g, h, i, j)
in enumerate(matches)
if index % 4 == 3]
abcdefghij = zip(a, b, c, d, e, f, g, h, i, j)
for a, b, c, d, e, f, g, h, i, j in abcdefghij:
print("a", a, "\nb", b, "\nc", c, "\nd", d, "\ne", e, "\nf", f, "\ng", g, "\nh", h, "\ni", i, "\nj", j, "\n", "-" * 100)
#
With this code, I get the following error message...
... a, b, c, d = [(a, b, c, d) for index, (a, b, c, d, *_) in enumerate(matches) if index % 4 == 1]` ValueError: too many values to unpack (expected 4)`
I would have expected these two methods to do the exact same logic and the end results should be exactly the same.
They are not! Why?
Upvotes: 2
Views: 437
Reputation: 4867
When a list comprehension creates a list of tuples, and you want to unpack those tuples, then you need to do the following with zip(*...)
x, y, z = zip(*list_comprehension)
# To be more clear
x, y, z = zip(*[(i, j, k) for (i, j, k) in tuple_list])
# For my code, this change must be made this code
a, b, c, d = zip(*[
(a, b, c, d)
for index, (a, b, c, d, *_)
in enumerate(matches)
if index % 4 == 1
])
...
# And this code
g, h, i, j = zip(*[
(g, h, i, j)
for index, (*_, g, h, i, j)
in enumerate(matches)
if index % 4 == 3
])
Let's take a look at the following code.
matches = [
("a1", "b1", "c1", "d1", "e1"),
("a2", "b2", "c2", "d2", "e2"),
("a3", "b3", "c3", "d3", "e3"),
("a4", "b4", "c4", "d4", "e4"),
("a5", "b5", "c5", "d5", "e5")
]
# I want a tuple of a's, b's, and c's
abc = [
(a, b, c)
for (a, b, c, *_) # Ignore elements `d` and `e`
in matches
]
print("abc =", abc)
# abc = [('a1', 'b1', 'c1'), ('a2', 'b2', 'c2'), ('a3', 'b3', 'c3'), ('a4', 'b4', 'c4'), ('a5', 'b5', 'c5')]
# NOTE: This is a list of tuples of ones, twos, threes, fours, and fives
# Not a's, b's, and c's!!
# I want a list of e's
e = [
e
for (*_, e)
in matches
]
print("e =", e)
# e = ['e1', 'e2', 'e3', 'e4', 'e5']
# NOTE: This is a list of e's
The fact that with abc
is that I get a list of one's, two's, three's, four's, and five's and not a's, b's and c's.
The reason for the error message ValueError: too many values to unpack
is because you have too many or too few tuples in your list of tuples to unpack.
Remember, you have a list of one's, two's, three's, four's, and five's (5 elements per tuple) and not a's, b's and c's (3 elements per tuple)
So this will always fail
a, b, c = [
(a, b, c)
for (a, b, c, *_)
in matches
]
# ERROR
# Traceback (most recent call last):
# File "...*.py", line 11, in <module>
# for (a, b, c, *_) in matches
# ValueError: too many values to unpack (expected 3)
You are trying to put these values [('a1', 'b1', 'c1'), ('a2', 'b2', 'c2'), ('a3', 'b3', 'c3'), ('a4', 'b4', 'c4'), ('a5', 'b5', 'c5')]
into 3 tuples. You can't! You need 5 tuples inside and outside the list comprehension
But this will succeed. It will be wrong. But it won't cause an error.
# This will assign 5 variables with the tuples (a, b, c) from the original tuples (a, b, c, d, e)
ones, twos, threes, fours, fives = [
(a, b, c)
for (a, b, c, *_) in matches
]
print("ones =", ones)
print("twos =", twos)
print("threes =", threes)
print("fours =", fours)
print("fives =", fives)
# Output
# ones = ('a1', 'b1', 'c1')
# twos = ('a2', 'b2', 'c2')
# threes = ('a3', 'b3', 'c3')
# fours = ('a4', 'b4', 'c4')
# fives = ('a5', 'b5', 'c5')
Remeber that we want something like ('a1', 'a2', 'a3', 'a4', 'a5')
, not ('a1', 'b1', 'c1')
And if the tuples were of size 20, then you would need to have ...sixs, sevens, .... , nineteens, twenties = [ ... ]
Well, we want all the 1st elements from each tuple to go together. Same for the 2nd and 3rd. So zip(...)
seems like a good candidate. Let's look at the results.
result = list(zip(abc))
print(result)
# list(zip(abc)) = [(('a1', 'b1', 'c1'),), (('a2', 'b2', 'c2'),), (('a3', 'b3', 'c3'),), (('a4', 'b4', 'c4'),), (('a5', 'b5', 'c5'),)]
# Let's look at what one element looks like
print(result[0])
# result[0] = (('a1', 'b1', 'c1'),)
This is wrong!
As you can see, there are a few things one.
zip
a list of tuples. This is the result.ones
not a list of a
Well, zip
doesn't work on a list of tuples (as is). We have to do something to the list of tuples first
Let's look at this...
abc = [(a, b, c) for (a, b, c, *_) in matches]
print(abc)
# abc = [('a1', 'b1', 'c1'), ('a2', 'b2', 'c2'), ('a3', 'b3', 'c3'), ('a4', 'b4', 'c4'), ('a5', 'b5', 'c5')]
# Again, we cannot zip these
print(*abc)
# *abc = ('a1', 'b1', 'c1') ('a2', 'b2', 'c2') ('a3', 'b3', 'c3') ('a4', 'b4', 'c4') ('a5', 'b5', 'c5')
# Wait, here we have a sequence of tuples. Not a list of tuples. Just tuple after tuple after tuple.
# What happens when we zip this "sequence" of tuples?
print(list(zip(*abc)))
# list(zip(*abc)) = [('a1', 'a2', 'a3', 'a4', 'a5'), ('b1', 'b2', 'b3', 'b4', 'b5'), ('c1', 'c2', 'c3', 'c4', 'c5')]
# Great, so let's try this
a, b, c = zip(*abc)
That's what we want!!
Since we can do the following.
a, b, c, d = zip(*abcd)
print("a =", a)
print("b =", b)
print("c =", c)
# Output
# a = ('a1', 'a2', 'a3', 'a4', 'a5')
# b = ('b1', 'b2', 'b3', 'b4', 'b5')
# c = ('c1', 'c2', 'c3', 'c4', 'c5')
That means we can do this...
a, b, c, d = zip(*[
(a, b, c, d)
for index, (a, b, c, d, *_)
in enumerate(matches)
])
Upvotes: 0
Reputation: 23783
@PaulPanzer That appears to work. I will have to verify that everything lines up correctly. But why do I need that?
Say q
is an iterable for which (?) your comprehension produces a list with 26 tuples, and each tuple has 4 items.
z = [(a,b,c,d) for i, (a,b,c,d,*e) in enumerate(q)]
In [6]: len(z)
Out[6]: 26
In [7]: len(z[0])
Out[7]: 4
In [17]: z[:3]
Out[17]: [('a', 'a', 'a', 'a'), ('b', 'b', 'b', 'b'), ('c', 'c', 'c', 'c')]
When you try to unpack you are trying to stuff 26 items into four names/variables
In [8]: a,b,c,d = z
Traceback (most recent call last):
File "<ipython-input-8-64277b78f273>", line 1, in <module>
a,b,c,d = z
ValueError: too many values to unpack (expected 4)
zip(*list_of_4_item_tuples)
will transpose the list_of_4_item_tuples
to 4 tuples with 26 items each
In [9]:
In [9]: a,b,c,d = zip(*z) # z is the result of the list comprehension shown above
In [11]: len(a),len(b),len(c),len(d)
Out[11]: (26, 26, 26, 26)
Test stuff
import string
a = string.ascii_lowercase
b = string.ascii_lowercase
c = string.ascii_lowercase
d = string.ascii_lowercase
e = string.ascii_lowercase
f = string.ascii_lowercase
q = zip (a,b,c,d,e,f)
Upvotes: 2
Reputation: 4943
Your list [(a, b, c, d) for index, (a, b, c, d, *_) in enumerate(matches) if index % 4 == 1]
doesn't have excatly 4 elements, meaning that trying to unpack it using only four variables fails.
Upvotes: 0