Noah Bogart
Noah Bogart

Reputation: 1767

Copying nested custom objects: alternatives to deepcopy

I'm looking to make a deep copy of a class object that contains a list of class objects, each with their own set of stuff. The objects don't contain anything more exciting than ints and lists (no dicts, no generators waiting to yield, etc). I'm performing the deep copy on between 500-800 objects a loop, and it's really slowing the program down. I realize that this is already inefficient; it currently can't be changed.

Example of how it looks:

import random
import copy

class Base:
    def __init__(self, minimum, maximum, length):
        self.minimum = minimum
        self.maximum = maximum
        self.numbers = [random.randint(minimum, maximum) for _ in range(length)]
        # etc

class Next:
    def __init__(self, minimum, maximum, length, quantity):
        self.minimum = minimum
        self.maximum = maximum
        self.bases = [Base(minimum, maximum, length) for _ in range(quantity)]
        # etc

Because of the actions I'm performing on the objects, I can't shallow copy. I need the contents to be owned by the new variable:

> first = Next(0, 10, 5, 10)
> second = first
> first.bases[0].numbers[1] = 4
> print(first.bases[0].numbers)
> [2, 4, 3, 3, 8]
> print(second.bases[0].numbers)
> [2, 4, 3, 3, 8]
>
> first = Next(0, 10, 5, 10)
> second = copy.deepcopy(first)
> first.bases[0].numbers[1] = 4
> print(first.bases[0].numbers)
> [8, 4, 7, 9, 9]
> print(second.bases[0].numbers)
> [8, 11, 7, 9, 9]

I've tried a couple of different ways, such as using json to serialize and reload the data, but in my tests it's not been nearly fast enough, because I'm stuck reassigning all of the variables each time. My attempt at pulling off a clever self.__dict__ = dct hasn't worked because of the nested objects.

Any ideas for how to efficiently deep copy multiply-nested Python objects without using copy.deepcopy?

Upvotes: 6

Views: 10261

Answers (2)

Noah Bogart
Noah Bogart

Reputation: 1767

Based on cherish's answers here, pickle.loads(pickle.dumps(first)) works about twice as fast per call. I had written it off initially because of an unrelated error when testing it, but on retesting it, it performs well within my needs.

Upvotes: 9

Tadhg McDonald-Jensen
Tadhg McDonald-Jensen

Reputation: 21453

One of the first things that copy.deepcopy looks for is if the object defines it's own __deepcopy__ method So instead of having it figure out how to copy object every time just define your own process.

It would require you have a way of defining a Base object without any element of random-ness for the copies to use, but if you can find a more efficient process of copying your objects you should define it as a __deepcopy__ method to speed up copying process.

Upvotes: 5

Related Questions