Matthijs Brouns
Matthijs Brouns

Reputation: 2329

How to prefetch descendants with django-treebeard's MP_Node?

I'm developing an application with a hierarchical data structure in django-rest-framework using django-treebeard. My (simplified) main model looks like this

class Task(MP_Node):
    name = models.CharField(_('name'), max_length=64)
    started = models.BooleanField(default=True)

What I'm currently trying to achieve is a list view of all root nodes which shows extra fields (such as whether all children have started). To do this I specified a view:

class TaskViewSet(viewsets.ViewSet):

    def retrieve(self, request, pk=None):
        queryset = Task.get_tree().filter(depth=1, job__isnull=True)
        operation = get_object_or_404(queryset, pk=pk)
        serializer = TaskSerializer(operation)
        return Response(serializer.data)

and serializer

class TaskSerializer(serializers.ModelSerializer):
    are_children_started = serializers.SerializerMethodField()

    def get_are_children_started(self, obj):
        return all(task.started for task in Task.get_tree(obj))

This all works and I get the expected results. However, I run into a N+1 query problem where for each root task I need to fetch all children separately. Normally this would be solvable using prefetch_related but as I use the Materialized Path structure from django-treebeard there are no Django relationships between the task models, so prefetch_related doesn't know what to do out of the box. I've tried to use custom Prefetch objects but as this still requires a Django relation path I could not get it to work.

My current idea is to extend the Task model with a foreign key pointing to its root node like such:

root_node = models.ForeignKey('self', null=True,
                              related_name='descendant_tasks',
                              verbose_name=_('root task')
                              )

in order to make the MP relationship explicit so it can be queried. However, this does feel like a bit of a non-dry method of doing it so I wonder whether anyone has another suggestion on how to tackle it.

Upvotes: 3

Views: 3259

Answers (2)

jnns
jnns

Reputation: 5644

If you want to avoid using a Foreign Key you can iterate over the queryset and re-create the tree structure in memory.

In my case I wanted to have a template tag (much like django-mptt's recursetree templatetag) to show multiple levels of nested pages with only one database query. Basically copying mptt.utils.get_cached_trees I ended up with this:

def get_cached_trees(queryset: QuerySet) -> list:
    """Return top-most pages / roots.

    Each page will have its children stored in `_cached_children` attribute
    and its parent in `_cached_parent`. This avoids having to query the database.
    """
    top_nodes: list = []
    path: list = []
    for obj in queryset:
        obj._cached_children = []
        if obj.depth == queryset[0].depth:
            add_top_node(obj, top_nodes, path)
        else:
            while not is_child_of(obj, parent := path[-1]):
                path.pop()
            add_child(parent, obj)

        if obj.numchild:
            path.append(obj)

    return top_nodes

def add_top_node(obj: MP_Node, top_nodes: list, path: list) -> None:
    top_nodes.append(obj)
    path.clear()

def add_child(parent: MP_Node, obj: MP_Node) -> None:
    obj._cached_parent = parent
    parent._cached_children.append(obj)

def is_child_of(child: MP_Node, parent: MP_Node) -> bool:
    """Return whether `child` is a sub page of `parent` without database query.

    `_get_children_path_interval` is an internal method of MP_Node.
    """
    start, end = parent._get_children_path_interval(parent.path)
    return start < child.path < end

It can be used like this to avoid the dreaded N+1 query problem:

for page in get_cached_trees(queryset):
    for child in page._cached_children:
        ... 

Upvotes: 1

Matthijs Brouns
Matthijs Brouns

Reputation: 2329

In the end I did end up with adding a foreign key to each task pointing to its root node like such:

root_node = models.ForeignKey('self', null=True,
                          related_name='descendant_tasks',
                          verbose_name=_('root task')
                          )

I updated my save method on my Task model to make sure I always point to the correct root node

def save(self, force_insert=False, force_update=False, using=None, update_fields=None):
    try:
        self.root_task = self.get_root()
    except ObjectDoesNotExist:
        self.root_task = None

    return super(Task, self).save(force_insert=False, force_update=False, using=None,
                                  update_fields=None
                                  )

and this allows me to simply prefetch all descendants using prefetch_related('descendants').

Whenever I need to have the descendants in a nested fashion I use the following function to nest the flattened list of descendants again

def build_nested(tasks):

    def get_basepath(path, depth):
        return path[0:depth * Task.steplen]

    container, link = [], {}
    for task in sorted(tasks, key=attrgetter('depth')):
        depth = int(len(task.path) / Task.steplen)
        try:
            parent_path = get_basepath(task.path, depth - 1)
            parent_obj = link[parent_path]
            if not hasattr(parent_obj, 'sub_tasks'):
                parent_obj.sub_tasks = []
            parent_obj.sub_tasks.append(task)
        except KeyError:  # Append it as root task if no parent exists
            container.append(task)

        link[task.path] = task

    return container

Upvotes: 3

Related Questions