cfrei89
cfrei89

Reputation: 123

Deepcopy list of lists (speed issue)

Due to unclear formulation I decided to rewrite my question: My code looks sth like this (org is supposed to be a list of a list and two integers):

def my_copy(org):
    temp = (tuple(org[0]), org[1], org[2])
    temp2 = []
    temp2.append(list(temp[0]))
    temp2.append(temp[1])
    temp2.append(temp[2])
    return temp2
a = [[1,2,3], 4, 5]
b = []
for i in range(5):
    b.append(my_copy(a))

now I can change the elements of b without influencing the others copies. Unlike if I were to use

b.append(copy.copy(a)) in the loop

I do all this to avoid using copy.deepcopy() which appears to be quite slow. There are now three questions: Does this code generate a deepcopy of my list? And if not, why does it still create copies and not just new references as b.append(a) would? In addition: How can I do this in a more elegant, fast and pythonic way?

Upvotes: 0

Views: 1153

Answers (1)

ebarr
ebarr

Reputation: 7842

There appears to be some misunderstanding here of the difference between a shallow copy and a deep copy. You state in your question that you are appending lists of lists. Let's assume that the following is such a list:

In [32]: x = [[1,2,3],[4,5,6]]

In a shallow copy we copy only the first layer. From the docs:

A shallow copy constructs a new compound object and then (to the extent possible) inserts references into it to the objects found in the original.

In [33]: z = []

# using the method you describe
In [35]: z.append(list(tuple(list(x))))

In [36]: z
Out[36]: [[[1, 2, 3], [4, 5, 6]]]

If we now modify the contents of z we alter x, as we have used a shallow copy.

In [38]: z[0][0][0]=7

In [39]: x
Out[39]: [[7, 2, 3], [4, 5, 6]]

In a deep copy, we make a copy of the object at all levels, essentially creating a clone of the original object. From the docs:

A deep copy constructs a new compound object and then, recursively, inserts copies into it of the objects found in the original.

In [40]: import copy 

In [41]: z = []

In [42]: x = [[1,2,3],[4,5,6]]

In [43]: z.append(copy.deepcopy(x))

In [44]: z
Out[44]: [[[1, 2, 3], [4, 5, 6]]]

In [45]: z[0][0][0] = 7

In [46]: x
Out[46]: [[1, 2, 3], [4, 5, 6]]

Numpy is likely the fastest solution to this problem, but you will have to refactor your code to get a benefit. Numpy will not benefit you if you are converting between lists and arrays in the base level of a loop. Instead you should try an vectorise the problem early on and minimize the number of type conversions.

Edit:

Looking at the update question, it seems that there is a very simple solution. If the list in your list only ever contains immutable types you can use either of the following:

def my_copy_1(org):
    return (copy.copy(org[0]),org[1],org[2])

def my_copy_2(org):
    return (org[0][:],org[1],org[2])

Testing the speed on these against your original implementation, I get:

In [2]: a = [[1,2,3],1,2]

In [3]: %timeit tmp.my_copy_orig(a)
100000 loops, best of 3: 2.05 µs per loop

In [4]: %timeit tmp.my_copy_1(a)
100000 loops, best of 3: 2.06 µs per loop

In [5]: %timeit tmp.my_copy_2(a)
1000000 loops, best of 3: 784 ns per loop

It would appear that my_copy_2 is the clear winner here in terms of speed. You can test that it produces the correct behaviour with:

In [6]: a = [[1,2,3],1,2]

In [7]: z = tmp.my_copy_2(a)

In [8]: z[2] = 999

In [9]: z[0][0] = 999

In [10]: a
Out[10]: [[1, 2, 3], 1, 2]

In [11]: z
Out[11]: [[999, 2, 3], 1, 999]

Upvotes: 4

Related Questions