python memoryview slower than expected

Question

Given that Python's memoryview interface to the buffered protocol can help reduce the need to make interim copies of data, I decided to do a quick test of it based on this answer to this question.

import time

expressions = ['list(b[i:i+1000])',
               'list(b[i:])',
               'b[i:]'
              ]

size = 1000000
x = b'x'*size
mv = memoryview(x)
for e in expressions:
    print(f"Expression: {e}")
    for b in (x, mv):
        l = len(b)
        start = time.time()
        for i in range(0, l, 1000):
            eval(e)
        end = time.time()
        print(f"Size: {size}, {type(b).__name__}, time: {end-start}")

The result:

$ python c:	emp	est_memoryview.py
Expression: list(b[i:i+1000])
Size: 1000000, bytes, time: 0.021999597549438477
Size: 1000000, memoryview, time: 0.03600668907165527
Expression: list(b[i:])
Size: 1000000, bytes, time: 5.3010172843933105
Size: 1000000, memoryview, time: 11.202003479003906
Expression: b[i:]
Size: 1000000, bytes, time: 0.2990117073059082
Size: 1000000, memoryview, time: 0.006985902786254883

The first two results seem like quite a surprising result. I understand that calling list will involve a copy of the data but I thought that when slicing a memory view instead of the underlying byte array, you save on an interim copy.

user2357112 · Accepted Answer

You can't think of Python like you would C or C++. The constant-factor overhead of an extra copy is far lower than the constant-factor overhead involved in supporting all of Python's dynamic features, especially with no JIT in CPython. You can't assume saving one copy is going to actually help once you take into account the other stuff you have to change to avoid that copy.

In this case, almost all of the work is in the list conversion. The copy you're saving is meaningless. Compare the timings for b[i:] and list(b[i:]), and you'll see the slicing is only a few percent of the runtime even when the slice performs a copy.

The copy you save doesn't matter because it's basically just a memcpy. In contrast, the list conversion needs to create an iterator over the bytestring or memoryview, call the iterator's tp_iternext slot repeatedly, obtain int objects corresponding to the raw bytes of memory, etc., which is way more expensive. It's even more expensive for the memoryview, because memoryview objects have to support multidimensional shapes and non-byte data types, and because the memoryview implementation doesn't have a dedicated __iter__ implementation, so it goes through the generic sequence-based fallback iteration, which is slower.

You can save some time by using the memoryview's tolist method instead of calling list. This skips a bunch of iteration protocol overhead and allows some checks to be done just once instead of once per item. In my tests, this is almost as fast as calling list on a bytestring.

python memoryview slower than expected

Answers (1)

Related Questions