Reputation: 402633
I've been working on a performance critical application which requires frequently requires making copies of a 2D list of integers and modifying the copy (I'm implementing the minimax algorithm).
I've noticed there is a huge difference in performance between a copy, and a deepcopy on lists with the same number of elements, and I'd like to understand if my thinking is correct.
To reproduce my problem, run the following code:
import numpy as np
np.random.seed(0)
lst1 = np.random.randint(100, size=1000 * 1000).tolist()
lst2 = np.random.randint(100, size=(1000, 1000)).tolist()
Now, timing the statements below, you should see timings similar to mine.
%timeit copy.copy(lst1)
%timeit lst1.copy()
%timeit copy.deepcopy(lst2)
5 ms ± 49.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
5.47 ms ± 551 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
1.61 s ± 112 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Both lst1
and lst2
have a million elements, but reliably copying the former is 200x faster than a nested list with the same number of elements. I thought this would have to do with the fact that making deep copies of nested lists might require some recursive implementation that is slow, so I tried
%timeit copy.deepcopy(lst1)
1.43 s ± 90.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
And the timings still show a massive slowdown. I've checked the docs but not much explanation was offered. However, from the timings, I suspect that deepcopy
is copying each int as well, creating new integers. But this seems like a wasteful thing to do.
Am I right in my thinking here? What is deepcopy doing here that list.copy
and shallow copy don't?
I've seen deepcopy() is extremely slow but it seems that question is asking for an alternative rather than an explanation (it wasn't clear to me).
Upvotes: 4
Views: 4689
Reputation: 1454
https://docs.python.org/2/library/copy.html
The difference between shallow and deep copying is only relevant for compound objects (objects that contain other objects, like lists or class instances):
So effectively, the shallow copy will create a new list and populate it with a reference to every element in the original list. Because every element in the original list is itself a list, it's much faster to just store a reference to this than it is to create a new copy. Deepcopy does some clever stuff in how it copies each element in order to avoid errors. But in essence you don't need to understand that to know why one shallowcopy is faster than deepcopy....
Upvotes: 0
Reputation: 281252
deepcopy
isn't copying the ints. There's no way it could do that anyway.
deepcopy
is slow because it needs to handle the full complexity of a deep copy, even if that turns out to be unnecessary. That includes dispatching to the appropriate copier for every object it finds, even if the copier turns out to basically just be lambda x: x
. That includes maintaining a memo dict and keeping track of every object copied, to handle duplicate references to the same objects, even if there are none. That includes special copy handling for data structures like lists and dicts, so it doesn't go into an infinite recursion when trying to copy a data structure with recursive references.
All of that has to be done no matter whether it pays off. All of it is expensive.
Also, deepcopy
is pure-Python. That doesn't help. Comparing deepcopy
to pickle.loads(pickle.dumps(whatever))
, which performs a very similar job, pickle
wins handily due to the C implementation. (On Python 2, replace pickle
with cPickle
.) pickle
still loses hard to an implementation that takes advantage of the known structure of the input, though:
In [15]: x = [[0]*1000 for i in range(1000)]
In [16]: %timeit copy.deepcopy(x)
1.05 s ± 5.14 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [17]: %timeit pickle.loads(pickle.dumps(x))
78 ms ± 4.03 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [18]: %timeit [l[:] for l in x]
4.56 ms ± 108 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Upvotes: 7
Reputation: 1
In programming, deep copy is equivalent to a physical copy of something. It is an actual copy of the original object. In most programming tools, you can play around with it, modify it without affecting the original object. However, on the other hand, a shallow copy is a reference to the original object. If you change it, it will affect the original object as well. In short, since the deep copy is the actual copy of the original object, it is heavier that the shallow copy which just points to the original object.
Shallow copy: You can have a picture(s) of your new furniture, and get an idea of what it really looks like. You can easily carry around the picture.
Deep copy: You can go to the furniture shop, and look at the real furniture. You probably can't carry around easily, and you may need some help to take it home.
Upvotes: -1