Overhead of finding the length of a sequence in python?

Question

I was going to find the length of a sequence in python3 and found that range is now of not

Being so, the range call seems to create a generator in memory without the overhead of creating a list and populating it as in python2 (please correct me if I'm misunderstanding or I am wrong about this).

My question now is, would there be any noticeable improvement for calculating the length of a sequence where, I call len(range(start, stop, step)) instead of calculating int(math.ceil((stop - start) / step)).

This is a rough example of what I would be doing:

s = list(range(limit))
s[1] = 0
for i in range(2, sqrtn + 1):
    if s[i]:
        s[i*i: limit: i] = [0] * len(range(i*i, limit, i)) # call to range

Would the above be actually more efficient then calculating the length?

from math import ceil

s = list(range(limit))
s[1] = 0
for i in range(2, sqrtn + 1):
if s[i]:
    s[i*i: limit: i] = [0] * int(math.ceil((limit - i*i) / i)) # no range call

ShadowRanger · Accepted Answer

In theory, yes, in practice, no. All timings on Windows x64 build of CPython 3.5.0 (specific times irrelevant; the relative times for each approach are what matter):

>>> from math import ceil
>>> start, stop, step = 100*100, 100000, 100
>>> min(timeit.repeat('(stop - start + (step - 1)) // step', 'from __main__ import start, stop, step', number=100000))
0.016031580173375914
>>> min(timeit.repeat('ceil((stop - start) / step)', 'from __main__ import start, stop, step, ceil', number=100000))
0.024184756985505373
>>> min(timeit.repeat('len(range(start, stop, step))', 'from __main__ import start, stop, step', number=100000))
0.03917228338013956

I've run those tests with a few different endpoints; if the values get large enough that the math can't be done in Py_ssize_t, the range and ceil approaches get closer together (ceil slows down), but the pure int math approach wins every test I've run. And both range and ceil have problems; for very large numbers, range will throw OverflowError (it can't have more elements than a Py_ssize_t can represent), and ceil (or rather, the float division before the ceil) will have floating point accuracy errors as you exceed ~53 bit values. The pure int math is both fast and reliable, and should probably be preferred.

That said, other Python interpreters (PyPy, IronPython, Jython, Cython) can special case stuff like range (and integer math for that matter), and could easily have completely different performance characteristics.

The real overhead here is not the len calculation. range actually computes the length and caches it internally during construction; retrieving it is as close to free as any named function call gets (and the same is true of all built-in sequences; at worst they have to construct a Python level int from a C level int, but all math operations do the same thing). The actual cost to retrieve the length:

>>> min(timeit.repeat('len(r)', 'from __main__ import start, stop, step; r = range(start, stop, step)', number=100000))
0.0076398965929911355

Overhead of finding the length of a sequence in python?

Answers (1)

Related Questions