Reputation: 147
Recently I'm writing a download program, which uses the HTTP Range field to download many blocks at the same time. I wrote a Python class to represent the Range (the HTTP header's Range is a closed interval):
class ClosedRange:
def __init__(self, begin, end):
self.begin = begin
self.end = end
def __iter__(self):
yield self.begin
yield self.end
def __str__(self):
return '[{0.begin}, {0.end}]'.format(self)
def __len__(self):
return self.end - self.begin + 1
The __iter__
magic method is to support the tuple unpacking:
header = {'Range': 'bytes={}-{}'.format(*the_range)}
And len(the_range)
is how many bytes in that Range.
Now I found that 'bytes={}-{}'.format(*the_range)
occasionally causes the MemoryError
. After some debugging I found that the CPython interpreter will try to call len(iterable)
when executing func(*iterable)
, and (may) allocate memory based on the length. On my machine, when len(the_range)
is greater than 1GB, the MemoryError
appears.
This is a simplified one:
class C:
def __iter__(self):
yield 5
def __len__(self):
print('__len__ called')
return 1024**3
def f(*args):
return args
>>> c = C()
>>> f(*c)
__len__ called
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
MemoryError
>>> # BTW, `list(the_range)` have the same problem.
>>> list(c)
__len__ called
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
MemoryError
So my questions are:
Why CPython call len(iterable)
? From this question I see you won't know an iterator's length until you iterate throw it. Is this an optimization?
Can __len__
method return the 'fake' length (i.e. not the real number of elements in memory) of an object?
Upvotes: 1
Views: 83
Reputation: 16634
Why CPython call
len(iterable)
? From this question I see you won't know an iterator's length until you iterate throw it. Is this an optimization?
when python (assuming python3) execute f(*c)
, opcode CALL_FUNCTION_EX
is used:
0 LOAD_GLOBAL 0 (f)
2 LOAD_GLOBAL 1 (c)
4 CALL_FUNCTION_EX 0
6 POP_TOP
as c
is an iterable, PySequence_Tuple
is called to convert it to a tuple, then PyObject_LengthHint
is called to determine the new tuple length, as __len__
method is defined on c
, it gets called and its return value is used to allocate memory for a new tuple, as malloc
failed, finally MemoryError
error gets raised.
/* Guess result size and allocate space. */
n = PyObject_LengthHint(v, 10);
if (n == -1)
goto Fail;
result = PyTuple_New(n);
Can
__len__
method return the 'fake' length (i.e. not the real number of elements in memory) of an object?
in this scenario, yes.
when the return value of __len__
is smaller than need, python will adjust memory space of new tuple object to fit when filling the tuple. if it is larger than need, although python will allocate extra memory, _PyTuple_Resize
will be called in the end to reclaim over-allocated space.
Upvotes: 2