Olivier Grégoire
Olivier Grégoire

Reputation: 35477

Deep copy of a dict in python

I would like to make a deep copy of a dict in python. Unfortunately the .deepcopy() method doesn't exist for the dict. How do I do that?

>>> my_dict = {'a': [1, 2, 3], 'b': [4, 5, 6]}
>>> my_copy = my_dict.deepcopy()
Traceback (most recent calll last):
  File "<stdin>", line 1, in <module>
AttributeError: 'dict' object has no attribute 'deepcopy'
>>> my_copy = my_dict.copy()
>>> my_dict['a'][2] = 7
>>> my_copy['a'][2]
7

The last line should be 3.

I would like that modifications in my_dict don't impact the snapshot my_copy.

How do I do that? The solution should be compatible with Python 3.x.

Upvotes: 644

Views: 586277

Answers (5)

bterwijn
bterwijn

Reputation: 326

You can use the memory-graph package to graph your data and see what values are shared and what effect your change has:

import memory_graph as mg # see link above for install instructions
import copy

my_dict = {'a': [1, 2, 3], 'b': [4, 5, 6]}

# three different "copies":
c1 = my_dict
c2 = copy.copy(my_dict) # equivalent to:  mydict.copy()  dict(my_dict)
c3 = copy.deepcopy(my_dict)

my_dict['a'][2] = 7    
print( c1['a'][2] ) # 7
print( c2['a'][2] ) # 7 
print( c3['a'][2] ) # 3

mg.show(locals()) # show the local variables in a graph
  • c1 is an assignment, nothing is copied, all the values are shared
  • c2 is a shallow copy, only the value referenced by the first reference is copied, all the underlying values are shared
  • c3 is a deep copy, all the values are copied, nothing is shared

three copy options

From the graph it is clear that, to make an independent copy so that your change to my_dict doesn't impact the c3 copy, you need a deep copy. However, when your dictionary is large a deep copy can be slow and requires a lot of extra memory. In that case consider making a custom copy as a fourth option, a shallow copy of the dictionary and a shallow copy of the value you want to change:

import memory_graph as mg # see link above for install instructions

my_dict = {'a': [1, 2, 3], 'b': [4, 5, 6]}

# custom copy as fourth option:
c4 = my_dict.copy()
c4['a'] = my_dict['a'].copy()

my_dict['a'][2] = 7
print( c4['a'][2] ) # 3

mg.show(locals()) # show the local variables in a graph

fourth copy option

This can be more efficient (in time and space), but now c4 still shares values with my_dict where you have to be careful with to avoid bugs. This is similar to the copy-on-write concept.

Full disclosure: I am the developer of memory-graph.

Upvotes: 1

hobs
hobs

Reputation: 19289

@Rob suggested a good alternative to copy.deepcopy() if you are planning to create standard APIs for your data (database or web request json payloads).

>>> import json
>>> my_dict = {'a': [1, 2, 3], 'b': [4, 5, 6]}
>>> json.loads(json.dumps(my_dict))
{'a': [1, 2, 3], 'b': [4, 5, 6]}

This approach is also thread safe. But it only works for jsonifiable (serializable) objects like str, dict, list, int, float, and None. Sometimes that jsonifiable constraint can be a good thing because it forces you to make your data structures compliant with standard database fixture and web request formats (for creating best-practice APIs).

If your data structures don't work directly with json.dumps (for example datetime objects and your custom classes) they will be need to be coerced into a string or other standard type before serializing with json.dumps(). And you'll need to run a custom deserializer as well after json.loads():

>>> from datetime import datetime as dt
>>> my_dict = {'a': (1,), 'b': dt(2023, 4, 9).isoformat()}
>>> d = json.loads(json.dumps(my_dict))
>>> d
{'a': [1], 'b': '2023-04-09T00:00:00'}
>>> for k in d:
...     try:
...         d[k] = dt.fromisoformat(d[k])
...     except:
...         pass
>>> d
{'a': [1], 'b': datetime.datetime(2023, 4, 9, 0, 0)}

Of course you need to do the serialization and deserialization on special objects recursively. Sometimes that's a good thing. This process normalizes all your objects to types that are directly serializable (for example tuples become lists) and you can be sure they'll match a reproducable data schema (for relational database storage).

And it's thread safe. The builtin copy.deepcopy() is NOT thread safe! If you use deepcopy within async code that can crash your program or corrupt your data unexpectedly long after you've forgotten about your code.

Upvotes: 1

xpros
xpros

Reputation: 2466

Python 3.x

from copy import deepcopy

# define the original dictionary
original_dict = {'a': [1, 2, 3], 'b': {'c': 4, 'd': 5, 'e': 6}}

# make a deep copy of the original dictionary
new_dict = deepcopy(original_dict)

# modify the dictionary in a loop
for key in new_dict:
    if isinstance(new_dict[key], dict) and 'e' in new_dict[key]:
        del new_dict[key]['e']

# print the original and modified dictionaries
print('Original dictionary:', original_dict)
print('Modified dictionary:', new_dict)

Which would yield:

Original dictionary: {'a': [1, 2, 3], 'b': {'c': 4, 'd': 5, 'e': 6}}
Modified dictionary: {'a': [1, 2, 3], 'b': {'c': 4, 'd': 5}}

Without new_dict = deepcopy(original_dict), 'e' element is unable to be removed.

Why? Because if the loop was for key in original_dict, and an attempt is made to modify original_dict, a RuntimeError would be observed:

"RuntimeError: dictionary changed size during iteration"

So in order to modify a dictionary within an iteration, a copy of the dictionary must be used.

Here is an example function that removes an element from a dictionary:

def remove_hostname(domain, hostname):
    domain_copy = deepcopy(domain)
    for domains, hosts in domain_copy.items():
        for host, port in hosts.items():
           if host == hostname:
                del domain[domains][host]
    return domain

Upvotes: 81

theBuzzyCoder
theBuzzyCoder

Reputation: 2880

dict.copy() is a shallow copy function for dictionary
id is built-in function that gives you the address of variable

First you need to understand "why is this particular problem is happening?"

In [1]: my_dict = {'a': [1, 2, 3], 'b': [4, 5, 6]}

In [2]: my_copy = my_dict.copy()

In [3]: id(my_dict)
Out[3]: 140190444167808

In [4]: id(my_copy)
Out[4]: 140190444170328

In [5]: id(my_copy['a'])
Out[5]: 140190444024104

In [6]: id(my_dict['a'])
Out[6]: 140190444024104

The address of the list present in both the dicts for key 'a' is pointing to same location.
Therefore when you change value of the list in my_dict, the list in my_copy changes as well.


Solution for data structure mentioned in the question:

In [7]: my_copy = {key: value[:] for key, value in my_dict.items()}

In [8]: id(my_copy['a'])
Out[8]: 140190444024176

Or you can use deepcopy as mentioned above.

Upvotes: 77

Lasse V. Karlsen
Lasse V. Karlsen

Reputation: 391664

How about:

import copy
d = { ... }
d2 = copy.deepcopy(d)

Python 2 or 3:

Python 3.2 (r32:88445, Feb 20 2011, 21:30:00) [MSC v.1500 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import copy
>>> my_dict = {'a': [1, 2, 3], 'b': [4, 5, 6]}
>>> my_copy = copy.deepcopy(my_dict)
>>> my_dict['a'][2] = 7
>>> my_copy['a'][2]
3
>>>

Upvotes: 892

Related Questions