Reputation: 35477
I would like to make a deep copy of a dict
in python. Unfortunately the .deepcopy()
method doesn't exist for the dict
. How do I do that?
>>> my_dict = {'a': [1, 2, 3], 'b': [4, 5, 6]}
>>> my_copy = my_dict.deepcopy()
Traceback (most recent calll last):
File "<stdin>", line 1, in <module>
AttributeError: 'dict' object has no attribute 'deepcopy'
>>> my_copy = my_dict.copy()
>>> my_dict['a'][2] = 7
>>> my_copy['a'][2]
7
The last line should be 3
.
I would like that modifications in my_dict
don't impact the snapshot my_copy
.
How do I do that? The solution should be compatible with Python 3.x.
Upvotes: 644
Views: 586277
Reputation: 326
You can use the memory-graph package to graph your data and see what values are shared and what effect your change has:
import memory_graph as mg # see link above for install instructions
import copy
my_dict = {'a': [1, 2, 3], 'b': [4, 5, 6]}
# three different "copies":
c1 = my_dict
c2 = copy.copy(my_dict) # equivalent to: mydict.copy() dict(my_dict)
c3 = copy.deepcopy(my_dict)
my_dict['a'][2] = 7
print( c1['a'][2] ) # 7
print( c2['a'][2] ) # 7
print( c3['a'][2] ) # 3
mg.show(locals()) # show the local variables in a graph
c1
is an assignment, nothing is copied, all the values are sharedc2
is a shallow copy, only the value referenced by the first reference is copied, all the underlying values are sharedc3
is a deep copy, all the values are copied, nothing is sharedFrom the graph it is clear that, to make an independent copy so that your change to my_dict
doesn't impact the c3
copy, you need a deep copy. However, when your dictionary is large a deep copy can be slow and requires a lot of extra memory. In that case consider making a custom copy as a fourth option, a shallow copy of the dictionary and a shallow copy of the value you want to change:
import memory_graph as mg # see link above for install instructions
my_dict = {'a': [1, 2, 3], 'b': [4, 5, 6]}
# custom copy as fourth option:
c4 = my_dict.copy()
c4['a'] = my_dict['a'].copy()
my_dict['a'][2] = 7
print( c4['a'][2] ) # 3
mg.show(locals()) # show the local variables in a graph
This can be more efficient (in time and space), but now c4
still shares values with my_dict
where you have to be careful with to avoid bugs. This is similar to the copy-on-write concept.
Full disclosure: I am the developer of memory-graph.
Upvotes: 1
Reputation: 19289
@Rob suggested a good alternative to copy.deepcopy()
if you are planning to create standard APIs for your data (database or web request json payloads).
>>> import json
>>> my_dict = {'a': [1, 2, 3], 'b': [4, 5, 6]}
>>> json.loads(json.dumps(my_dict))
{'a': [1, 2, 3], 'b': [4, 5, 6]}
This approach is also thread safe.
But it only works for jsonifiable (serializable) objects like str
, dict
, list
, int
, float
, and None
.
Sometimes that jsonifiable constraint can be a good thing because it forces you to make your data structures compliant with standard database fixture and web request formats (for creating best-practice APIs).
If your data structures don't work directly with json.dumps
(for example datetime
objects and your custom classes) they will be need to be coerced into a string or other standard type before serializing with json.dumps()
.
And you'll need to run a custom deserializer as well after json.loads()
:
>>> from datetime import datetime as dt
>>> my_dict = {'a': (1,), 'b': dt(2023, 4, 9).isoformat()}
>>> d = json.loads(json.dumps(my_dict))
>>> d
{'a': [1], 'b': '2023-04-09T00:00:00'}
>>> for k in d:
... try:
... d[k] = dt.fromisoformat(d[k])
... except:
... pass
>>> d
{'a': [1], 'b': datetime.datetime(2023, 4, 9, 0, 0)}
Of course you need to do the serialization and deserialization on special objects recursively.
Sometimes that's a good thing.
This process normalizes all your objects to types that are directly serializable (for example tuple
s become list
s) and you can be sure they'll match a reproducable data schema (for relational database storage).
And it's thread safe. The builtin copy.deepcopy()
is NOT thread safe! If you use deepcopy
within async
code that can crash your program or corrupt your data unexpectedly long after you've forgotten about your code.
Upvotes: 1
Reputation: 2466
Python 3.x
from copy import deepcopy
# define the original dictionary
original_dict = {'a': [1, 2, 3], 'b': {'c': 4, 'd': 5, 'e': 6}}
# make a deep copy of the original dictionary
new_dict = deepcopy(original_dict)
# modify the dictionary in a loop
for key in new_dict:
if isinstance(new_dict[key], dict) and 'e' in new_dict[key]:
del new_dict[key]['e']
# print the original and modified dictionaries
print('Original dictionary:', original_dict)
print('Modified dictionary:', new_dict)
Which would yield:
Original dictionary: {'a': [1, 2, 3], 'b': {'c': 4, 'd': 5, 'e': 6}}
Modified dictionary: {'a': [1, 2, 3], 'b': {'c': 4, 'd': 5}}
Without new_dict = deepcopy(original_dict)
, 'e' element is unable to be removed.
Why? Because if the loop was for key in original_dict
, and an attempt is made to modify original_dict, a RuntimeError would be observed:
"RuntimeError: dictionary changed size during iteration"
So in order to modify a dictionary within an iteration, a copy of the dictionary must be used.
Here is an example function that removes an element from a dictionary:
def remove_hostname(domain, hostname):
domain_copy = deepcopy(domain)
for domains, hosts in domain_copy.items():
for host, port in hosts.items():
if host == hostname:
del domain[domains][host]
return domain
Upvotes: 81
Reputation: 2880
dict.copy() is a shallow copy function for dictionary
id is built-in function that gives you the address of variable
First you need to understand "why is this particular problem is happening?"
In [1]: my_dict = {'a': [1, 2, 3], 'b': [4, 5, 6]}
In [2]: my_copy = my_dict.copy()
In [3]: id(my_dict)
Out[3]: 140190444167808
In [4]: id(my_copy)
Out[4]: 140190444170328
In [5]: id(my_copy['a'])
Out[5]: 140190444024104
In [6]: id(my_dict['a'])
Out[6]: 140190444024104
The address of the list present in both the dicts for key 'a' is pointing to same location.
Therefore when you change value of the list in my_dict, the list in my_copy changes as well.
Solution for data structure mentioned in the question:
In [7]: my_copy = {key: value[:] for key, value in my_dict.items()}
In [8]: id(my_copy['a'])
Out[8]: 140190444024176
Or you can use deepcopy as mentioned above.
Upvotes: 77
Reputation: 391664
How about:
import copy
d = { ... }
d2 = copy.deepcopy(d)
Python 2 or 3:
Python 3.2 (r32:88445, Feb 20 2011, 21:30:00) [MSC v.1500 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import copy
>>> my_dict = {'a': [1, 2, 3], 'b': [4, 5, 6]}
>>> my_copy = copy.deepcopy(my_dict)
>>> my_dict['a'][2] = 7
>>> my_copy['a'][2]
3
>>>
Upvotes: 892