casr
casr

Reputation: 1226

Reusing subqueries for ordering in Django ORM

I run a dog salon where dogs get haircuts on an infrequent basis. In order to encourage owners back I would like to send out vouchers for their next visit. The voucher will be based on whether a dog has had a haircut within the last 2 months to 2 years. Beyond 2 years ago we can assume that the customer has been lost and less than 2 months ago is too close to their previous haircut. We will first target owners that have recently visited.

My underlying database is PostgreSQL.

from datetime import timedelta
from django.db import models
from django.db.models import Max, OuterRef, Subquery
from django.utils import timezone


# Dogs have one owner, owners can have many dogs, dogs can have many haircuts

class Owner(models.model):
    id = models.UUIDField(primary_key=True, default=uuid.uuid4, editable=False)
    name = models.CharField(max_length=255)


class Dog(models.model):
    id = models.UUIDField(primary_key=True, default=uuid.uuid4, editable=False)
    owner = models.ForeignKey(Owner, on_delete=models.CASCADE, related_name="dogs")
    name = models.CharField(max_length=255)


class Haircut(models.model):
    id = models.UUIDField(primary_key=True, default=uuid.uuid4, editable=False)
    dog = models.ForeignKey(Dog, on_delete=models.CASCADE, related_name="haircuts")
    at = models.DateField()


today = timezone.now().date()
start = today - timedelta(years=2)
end = today - timedelta(months=2)

It strikes me that the problem can be broken down into two queries. The first is something that aggregates an owner's dogs to most recently cut within the last 2 months to 2 years.

dog_aggregate = Haircut.objects.annotate(Max("at")).filter(at__range=(start, end))

And then joins the result of that to the owners table.

owners_by_shaggiest_dog_1 = Owner.objects # what's the rest of this?

Resulting in SQL similar to:

select
  owner.id,
  owner.name
from
  (
    select
      dog.owner_id,
      max(haircut.at) last_haircut
    from haircut
      left join dog on haircut.dog_id = dog.id
    where
      haircut.at
        between current_date - interval '2' year
            and current_date - interval '2' month
    group by
      dog.owner_id
  ) dog_aggregate
  left join owner on dog_aggregate.owner_id = owner.id
order by
  dog_aggregate.last_haircut asc,
  owner.name;

Through some playing around I have managed to get the correct result with:

haircut_annotation = Subquery(
    Haircut.objects
    .filter(dog__owner=OuterRef("pk"), at__range=(start, end))
    .order_by("-at")
    .values("at")[:1]
)

owners_by_shaggiest_dog_2 = (
    Owner.objects
    .annotate(last_haircut=haircut_annotation)
    .order_by("-last_haircut", "name")
)

However, the resulting SQL seems inefficient as a new query is performed for every row:

select
  owner.id,
  owner.name,
  (
    select
    from haircut
      inner join dog on haircut.dog_id = dog.id
    where haircut.at
            between current_date - interval '2' year
                and current_date - interval '2' month
      and dog.owner_id = (owner.id)
    order by
      haircut.at asc
    limit 1
  ) last_haircut
from
  owner
order by
  last_haircut asc,
  owner.name;

P.S. I don't actually run a dog salon so I can't give you a voucher. Sorry!

Upvotes: 2

Views: 525

Answers (1)

willeM_ Van Onsem
willeM_ Van Onsem

Reputation: 476534

Given I understood it correctly, you can make a query like:

from django.db.models import Max

Owners.objects.filter(
    dogs__haircuts__at__range=(start, end)
).annotate(
    last_haircut=Max('dogs__haircuts__at')
).order_by('last_haircut', 'name')

The last haircut should be the Maximum here, since as time passes by, the timestamp is larger.

Note however that your query and this query, does not exclude owners of dogs that have been washed more recently. We simply do not take that into account when we calculate the last_haircut.

If you want to exclude such owners, you should build a query like:

from django.db.models import Max

Owners.objects.exclude(
    dogs__haircuts__at__gt=end
).filter(
    dogs__haircuts__at__range=(start, end)
).annotate(
    last_haircut=Max('dogs__haircuts__at')
).order_by('last_haircut', 'name')

Upvotes: 1

Related Questions