Reputation: 456
Galaxies across the universe host millions/billions of stars, each belonging to a specific type, depending on its physical properties (Red stars, Blue Supergiant, White Dwarf, etc). For each Star in my database, I'm trying to find the number of distinct galaxies that are also home for some star of that same type.
class Galaxy(Model):
...
class Star(Model):
galaxy = ForeignKey(Galaxy, related_name='stars')
type = CharField(...)
Performing this query individually for each Star might be comfortably done by:
star = <some_Star>
desired_galaxies = Galaxy.objects.filter(stars__type=star.type).distinct()
desired_count = desired_galaxies.count()
Or even, albeit more redundant:
desired_count = Star.objects.filter(galaxy__stars__type=star.type).values('galaxy').distinct()
This get a little fuzzier when I try to get the count result for all the stars in a "single" query:
all_stars = Star.objects.annotate(desired_count=...)
The main reason I want to do that is to be capable of sorting Star.objects.order_by('desired_count')
in a clean way.
What I've tried so far:
Star.annotate(desired_count=Count('galaxy', filter=Q(galaxy__stars__type=F('type')), distinct=True))
But this annotates 1
for every star. I guess I'll have to go for OuterRef, Subquery
here, but not sure on how.
Upvotes: 1
Views: 228
Reputation: 5116
Django doesn't provide a way to define multi-valued relationships between models that don't involve foreign keys yet. If it did you could do something like
class Galaxy(Model):
...
class Star(Model):
galaxy = ForeignKey(Galaxy, related_name='stars')
type = CharField(...)
same_type_stars = Relation(
'self', from_fields=('type',), to_fields=('type',)
)
Star.objects.annotate(
galaxies_count=Count('same_type_stars__galaxy', distinct=True)
)
Which would result in something along
SELECT
star.*,
COUNT(DISTINCT same_star_type.galaxy_id) galaxies_count
FROM star
LEFT JOIN star same_star_type ON (same_star_type.type = star.type)
GROUP BY star.id
If you want to achieve something similar you'll need to use subquery for now
Star.objects.annotate(
galaxies_count=Subquery(Star.objects.filter(
type=OuterRef('type'),
).values('type').values(
inner_count=Count('galaxy', distinct=True),
))
)
Which would result in something along
SELECT
star.*,
(
SELECT COUNT(DISTINCT inner_star.galaxy_id)
FROM star inner_star
WHERE inner_star.type = star.type
GROUP BY inner_star.type
) galaxies_count
FROM star
Which likely perform poorly on some databases that don't materialize correlated subqueries (e.g. MySQL). In all cases make sure you index Star.type
otherwise you'll get bad performance no matter what. A composite index on ('type', 'galaxy')
might be even better as it might allow you to perform index only scan (e.g. on PostgreSQL).
Upvotes: 1
Reputation: 51988
You can use GROUP BY
to get the count:
Star.objects.values('type').annotate(desired_count=Count('galaxy')).values('type', 'desired_count')
Upvotes: 2