binpy
binpy

Reputation: 4194

Django clone the recursive objects

previously I have a problem when I want to clone the objects recursively. I know the simply way to clone the object is like this:

obj = Foo.objects.get(pk=<some_existing_pk>)
obj.pk = None
obj.save()

But, I want to do more depth. For example, I have a models.py

class Post(TimeStampedModel):
    author = models.ForeignKey(User, related_name='posts',
                               on_delete=models.CASCADE)
    title = models.CharField(_('Title'), max_length=200)
    content = models.TextField(_('Content'))

    ...


class Comment(TimeStampedModel):
    author = models.ForeignKey(User, related_name='comments',
                               on_delete=models.CASCADE)
    post = models.ForeignKey(Post, on_delete=models.CASCADE)
    comment = models.TextField(_('Comment'))

    ...


class CommentAttribute(TimeStampedModel):
    comment = models.OneToOneField(Comment, related_name='comment_attribute',
                                   on_delete=models.CASCADE)
    is_bookmark = models.BooleanField(default=False)

    ...


class PostComment(TimeStampedModel):
    post = models.ForeignKey(Post, related_name='post_comments',
                             on_delete=models.CASCADE)
    comments = models.ManyToManyField(Comment)

    ...

When I clone the parent object from Post, the child objects like Comment, CommentAttribute and PostComment will also cloned by following new cloned Post objects. The child models are dynamically. So, I want to make it simple by creating the tool like object cloner.

This snippet below is what I have done;

from django.db.utils import IntegrityError


class ObjectCloner(object):
    """
    [1]. The simple way with global configuration:
    >>> cloner = ObjectCloner()
    >>> cloner.set_objects = [obj1, obj2]   # or can be queryset
    >>> cloner.include_childs = True
    >>> cloner.max_clones = 1
    >>> cloner.execute()

    [2]. Clone the objects with custom configuration per-each objects.
    >>> cloner = ObjectCloner()
    >>> cloner.set_objects = [
        {
            'object': obj1,
            'include_childs': True,
            'max_clones': 2
        },
        {
            'object': obj2,
            'include_childs': False,
            'max_clones': 1
        }
    ]
    >>> cloner.execute()
    """
    set_objects = []            # list/queryset of objects to clone.
    include_childs = True       # include all their childs or not.
    max_clones = 1              # maximum clone per-objects.

    def clone_object(self, object):
        """
        function to clone the object.
        :param `object` is an object to clone, e.g: <Post: object(1)>
        :return new object.
        """
        try:
            object.pk = None
            object.save()
            return object
        except IntegrityError:
            return None

    def clone_childs(self, object):
        """
        function to clone all childs of current `object`.
        :param `object` is a cloned parent object, e.g: <Post: object(1)>
        :return
        """
        # bypass the none object.
        if object is None:
            return

        # find the related objects contains with this current object.
        # e.g: (<ManyToOneRel: app.comment>,)
        related_objects = object._meta.related_objects

        if len(related_objects) > 0:
            for relation in related_objects:
                # find the related field name in the child object, e.g: 'post'
                remote_field_name = relation.remote_field.name

                # find all childs who have the same parent.
                # e.g: childs = Comment.objects.filter(post=object)
                childs = relation.related_model.objects.all()

                for old_child in childs:
                    new_child = self.clone_object(old_child)

                    if new_child is not None:
                        # FIXME: When the child field as M2M field, we gote this error.
                        # "TypeError: Direct assignment to the forward side of a many-to-many set is prohibited. Use comments.set() instead."
                        # how can I clone that M2M values?
                        setattr(new_child, remote_field_name, object)
                        new_child.save()

                    self.clone_childs(new_child)
        return

    def execute(self):
        include_childs = self.include_childs
        max_clones = self.max_clones
        new_objects = []

        for old_object in self.set_objects:
            # custom per-each objects by using dict {}.
            if isinstance(old_object, dict):
                include_childs = old_object.get('include_childs', True)
                max_clones = old_object.get('max_clones', 1)
                old_object = old_object.get('object')  # assigned as object or None.

            for _ in range(max_clones):
                new_object = self.clone_object(old_object)
                if new_object is not None:
                    if include_childs:
                        self.clone_childs(new_object)
                    new_objects.append(new_object)

        return new_objects

But, the problem is when the child field as M2M field, we gote this error.

>>> cloner.set_objects = [post]
>>> cloner.execute()
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "/home/agus/envs/env-django-cloner/django-object-cloner/object_cloner_demo/app/utils.py", line 114, in execute
    self.clone_childs(new_object)
  File "/home/agus/envs/env-django-cloner/django-object-cloner/object_cloner_demo/app/utils.py", line 79, in clone_childs
    self.clone_childs(new_child)
  File "/home/agus/envs/env-django-cloner/django-object-cloner/object_cloner_demo/app/utils.py", line 76, in clone_childs
    setattr(new_child, remote_field_name, object)
  File "/home/agus/envs/env-django-cloner/lib/python3.7/site-packages/django/db/models/fields/related_descriptors.py", line 546, in __set__
    % self._get_set_deprecation_msg_params(),
TypeError: Direct assignment to the forward side of a many-to-many set is prohibited. Use comments.set() instead.
>>> 

The error coming from setattr(...), and "Use comments.set() instead", but I still confuse how to update that m2m value?

new_child = self.clone_object(old_child)

if new_child is not None:
    setattr(new_child, remote_field_name, object)
    new_child.save()

I also have tried with this snippet below, but still have a bug. The cloned m2m objects are many & not filled into m2m values.

if new_child is not None:
    # check the object_type
    object_type = getattr(new_child, remote_field_name)

    if hasattr(object_type, 'pk'):
        # this mean is `object_type` as real object.
        # so, we can directly use the `setattr(...)`
        # to update the old relation value with new relation value.
        setattr(new_child, remote_field_name, object)

    elif hasattr(object_type, '_queryset_class'):
        # this mean is `object_type` as m2m queryset (ManyRelatedManager).
        # django.db.models.fields.related_descriptors.\
        # create_forward_many_to_many_manager.<locals>.ManyRelatedManager

        # check the old m2m values, and assign into new object.
        # FIXME: IN THIS CASE STILL GOT AN ERROR
        old_m2m_values = getattr(old_child, remote_field_name).all()
        object_type.add(*old_m2m_values)

    new_child.save()

Upvotes: 6

Views: 3507

Answers (5)

Daniele Zanotelli
Daniele Zanotelli

Reputation: 31

I enhanced the nice code from Mario Orlandi above, since I found some issues using it:

  • sometimes cascading objects were created more than one time, this because we may have a non-normalized database, that is, with some circular reference among objects, which leads the clone_object to get to the same object more than one time, and make more copies. Or worse, reach an already cloned one and then treating it as it was the object to clone (so creating a clone of a clone with partial reference copied)
  • m2m relations link new objects with old ones.

What I needed instead was to have a complete "tree" replication of that model and its related models, with also the replication of the m2m relations between clones (and not linking the clones to some of the source objects).

So, I encapsulated the former clone_object() function into a closure, and keeping track of the clones using the cloned dict i have been able to check for already created clones and avoided to generate duplicates.

Here is my code:

def clone_object(obj, attrs={}):
    """
    Adaption of https://stackoverflow.com/a/61729857/15274340

    """
    # we use a closure to keep track of already cloned models
    cloned = {}  # {model_cls: {obj.pk: clone.pk}}

    def _clone_obj(obj, attrs):
        nonlocal cloned

        cloned_pk = cloned.get(obj._meta.model, {}).get(obj.pk, None)
        clones_pks = cloned.get(obj._meta.model, {}).values()
        if cloned_pk:
            # Object has already been cloned before. Use that clone.
            clone = obj._meta.model.objects.get(pk=cloned_pk)
        elif obj.pk in clones_pks:
            # If it's the second time that we get to this object, cos we have
            # circular relations, it may be that object itself is a clone, but
            # the relation from which we came from has not been set yet.
            # If so, we just need to be sure to update its relations, without
            # creating another clone
            clone = obj
            # retrieve the src object, so we can replicate M2M relations later
            obj_pk = next(k for k, v in cloned.get(clone._meta.model).items()
                          if v == clone.pk)
            obj = clone._meta.model.objects.get(pk=obj_pk)
        else:
            # we start by building a "flat" clone
            clone = obj._meta.model.objects.get(pk=obj.pk)
            clone.pk = None

        # if caller specified some attributes to be overridden,
        # use them
        for key, value in attrs.items():
            setattr(clone, key, value)

        # save the partial clone to have a valid ID assigned
        clone.save()

        # save the clone pk for further retrieving
        if obj._meta.model not in cloned:
            cloned[obj._meta.model] = {}
        cloned[obj._meta.model][obj.pk] = clone.pk

        # Scan field to further investigate relations
        fields = clone._meta.get_fields()
        for field in fields:
            # Manage M2M fields by replicating all related records
            # found on parent "obj" into "clone"
            if not field.auto_created and field.many_to_many:
            for src in getattr(obj, field.name).all():
                # retrieve the cloned target object
                dst_pk = cloned[src._meta.model][src.pk]
                getattr(clone, field.name).add(dst_pk)

            # Manage 1-N and 1-1 relations by cloning child objects
            if field.auto_created and field.is_relation:
            if field.many_to_many:
                # do nothing
                pass
            else:
                # provide "clone" object to replace "obj"
                # on remote field
                attrs = {
                    field.remote_field.name: clone
                }
                children = field.related_model.objects.filter(**{field.remote_field.name: obj})
                for child in children:
                    _clone_obj(child, attrs)

        return clone
    return _clone_obj(obj, attrs)

Upvotes: 1

Mario Orlandi
Mario Orlandi

Reputation: 5849

I tried to solve this interesting problem with some working code ... that was tougher than I initially thought !

I departed from your original solution since I had some difficulty in following the ObjectCloner logic.

The simplest solution I can think of is given below; instead of using a class, I opted to have a single helper function clone_object(), which deals with a single object.

You can of course use a second function to deal with a list of objects or queryset, by scanning the sequence and calling clone_object() multiple times.

def clone_object(obj, attrs={}):

    # we start by building a "flat" clone
    clone = obj._meta.model.objects.get(pk=obj.pk)
    clone.pk = None

    # if caller specified some attributes to be overridden, 
    # use them
    for key, value in attrs.items():
        setattr(clone, key, value)

    # save the partial clone to have a valid ID assigned
    clone.save()

    # Scan field to further investigate relations
    fields = clone._meta.get_fields()
    for field in fields:

        # Manage M2M fields by replicating all related records 
        # found on parent "obj" into "clone"
        if not field.auto_created and field.many_to_many:
            for row in getattr(obj, field.name).all():
                getattr(clone, field.name).add(row)

        # Manage 1-N and 1-1 relations by cloning child objects
        if field.auto_created and field.is_relation:
            if field.many_to_many:
                # do nothing
                pass
            else:
                # provide "clone" object to replace "obj" 
                # on remote field
                attrs = {
                    field.remote_field.name: clone
                }
                children = field.related_model.objects.filter(**{field.remote_field.name: obj})
                for child in children:
                    clone_object(child, attrs)

    return clone

A POC sample project, tested with Python 3.7.6 and Django 3.0.6, has been saved in a public repo on github:

https://github.com/morlandi/test-django-clone

Upvotes: 15

auvipy
auvipy

Reputation: 1188

It would be easier to understand to help if you could tell the version of Django you are using and what actually you are trying to achieve by cloning. Django related fields work differently in different versions as the forward and backward reference have a different ways of doing things. So if you could tell the actual thing you are trying to do with your code?

Upvotes: 0

Oleg Russkin
Oleg Russkin

Reputation: 4404

First few concerns:

  • related_objects is limited, it returns only reverse relations - if child has foreign key to this parent, not foreign keys on this parent to child. While this may be valid approach - top to bottom - in reality relations may be organized in different way, or child may have another foreign key to another model - which may be required to be cloned as well. Need to see all fields and relations.

Not to mention Django recommendation:

(related_objects is) Private API intended only to be used by Django itself; get_fields() combined with filtering of field properties is the public API for obtaining this field list.


May suggest a bit different approach.

Use get_fields() instead of related_objects.

fields = object._meta.get_fields() will return a list of all the fields on the model - defined on the model itself as well as forward/reverse access fields, auto added by django (like the ones returned by related_objects).

This list can be filtered to get only required fields:

  • field.is_relation - will be True for relations and ForeignKey fields, ManyToMany fields etc

  • field.auto_created - will be True for fields auto-created by django - reverse relations, pk/id AutoField (but it will have is_relation==False)

  • field.many_to_many - will be True for ManyToMany fields and relations

This way you can select required fields or relations, forward or reverse, many to many or not. And knowing exactly type of relation - create objects accordingly, i.e. adding to set for ManyToMany.

Relation field Value (related object or objects) can be accessed with getattr, _meta access or with a query like:

children = field.related_model.objects.filter(
    **{field.remote_field.name: object}
)

Valid for relations and fields with relations.


Notes:

  • because it would probably be very app specific how far you want to clone relations up, down and sideways (include parents and their parents; children of children; relation with fk on model or relations with fk to model; follow fks to another models on children / parents) or filter which models are allowed and so on - it may be ok to have clone method more bound to specific model structure

  • there are also hidden relations - ones defined with related_name = "+" on ForeignKey or ManyToManyField. These still can be discovered with include_hidden parameter: object._meta.get_fields(include_hidden=True)

Upvotes: 0

Mario Orlandi
Mario Orlandi

Reputation: 5849

Since you have a M2M relation, you will need to create new records in the related table. For that, add() seems more appropriate. You might try something like this:

for old_child in relation.related_model.objects.all():
    new_child = self.clone_object(old_child)
    setattr(new_child, remote_field_name, object)
    relation.related_model.objects.add(new_child)

Please note that this code it untested, so it might require some adjustments.

Upvotes: 0

Related Questions