user2668284
user2668284

Reputation:

Copying to pre-allocated list slower than append which seems counterintuitive

This question has nothing to do with how to copy one list to another. It's about performance. Consider these two functions:

def funcA(data):
    A = []
    for d in data:
        A.append(d)
    return A


def funcB(data):
    A = [None] * len(data)
    i = 0
    for d in data:
        A[i] = d
        i += 1
    return A

funcA represents what I suppose one might call the classic way to copy by reference (ignoring the copy module for the sake of this discussion). It's easy to understand and performs extremely well. However, the ultimate size of array A is unknown and has to grow dynamically as elements are appended to it.

Now consider funcB. It's more complex, but array A is pre-allocated and items are added by index. I would expect this to be faster because it obviates any memory allocation/reallocation that could occur during appending.

However, I have determined empirically that funcB runs ~30% slower than funcA.

I'm interested to know if anyone has thoughts on why this might be. Below is the driver code I used:

from datetime import datetime

M = 1_000
ARRAY_SIZE = 100_000
DATA = [0] * ARRAY_SIZE

for func in [funcA, funcB]:
    _s = datetime.now()
    for _ in range(M):
        func(DATA)
    _e = datetime.now()
    print(f'{_e-_s}')

Typical timings are 4.2 seconds for funcA and 5.4 seconds for funcB. This is obviously a CPU-bound activity and therefore results will vary considerably across platforms.

Upvotes: 1

Views: 105

Answers (1)

Karl Knechtel
Karl Knechtel

Reputation: 61557

Timings on my system, running the reference implementation of Python 3.8, look like:

FuncA: 6.0 seconds

FuncB: 7.2 seconds

Optimizing FuncA by caching the method lookup, thus:

def funcD(data):
    A = []
    append = A.append
    for d in data:
        append(d)
    return A

4.3 seconds

Using a list comprehension: 2.5 seconds

Using any kind of built-in copy (data.copy(), data[:], list(data)): 0.9 seconds

It seems pretty clear to me that the bottleneck is in the interpretation of bytecodes.

Your funcB, while it avoids resizing the internal storage, needs to fill the memory with copies of the pointer to the None object (when the underlying storage is resized normally, the unused portion of the memory can be left uninitialized). A function that just does the [None] * len(data) part already takes longer than the built-in copies (under the hood it could be optimized with memset() or equivalent, but then the built-ins can use memcpy() or equivalent). It also needs to maintain the i variable and use it for indexing, and has to interpret bytecodes each time to do so.

Meanwhile, the list type uses the same exponential reallocation strategy seen in std::vector in C++. Although funcA needs to reallocate memory, it only needs to do so O(lg N) times, copying O(N) pointers in total between the allocations.

Upvotes: 2

Related Questions