Salvatore Iovene
Salvatore Iovene

Reputation: 2323

How can I optimize this really slow query generated by Django?

here's my Django ORM query:

Group.objects.filter(public = True)\
    .annotate(num_members = Count('members', distinct = True))\
    .annotate(num_images = Count('images', distinct = True))\
    .order_by(sort)

Unfortunately this is taking over 30 seconds even with just a few dozen Groups. Removing the annotate statements makes the query significantly faster at only 3 ms...

My database backend is Postgres and here's the SQL and explain:

Executed SQL
SELECT ••• FROM "astrobin_apps_groups_group"
LEFT OUTER JOIN "astrobin_apps_groups_group_members" ON (
    "astrobin_apps_groups_group"."id" = "astrobin_apps_groups_group_members"."group_id"
)
LEFT OUTER JOIN "astrobin_apps_groups_group_images" ON (
"astrobin_apps_groups_group"."id" = "astrobin_apps_groups_group_images"."group_id")
WHERE "astrobin_apps_groups_group"."public" = true
GROUP BY
    "astrobin_apps_groups_group"."id", 
    "astrobin_apps_groups_group"."date_created", 
    "astrobin_apps_groups_group"."date_updated", 
    "astrobin_apps_groups_group"."creator_id", 
    "astrobin_apps_groups_group"."owner_id", 
    "astrobin_apps_groups_group"."name", 
    "astrobin_apps_groups_group"."description", 
    "astrobin_apps_groups_group"."category", 
    "astrobin_apps_groups_group"."public", 
    "astrobin_apps_groups_group"."moderated", 
    "astrobin_apps_groups_group"."autosubmission", 
    "astrobin_apps_groups_group"."forum_id"
ORDER BY "astrobin_apps_groups_group"."date_updated" ASC

Time
30455.9268951 ms


QUERY PLAN
GroupAggregate  (cost=5910.49..8288.54 rows=216 width=242) (actual time=29255.329..30269.284 rows=27 loops=1)
  ->  Sort  (cost=5910.49..6068.88 rows=63357 width=242) (actual time=29253.278..29788.601 rows=201888 loops=1)
        Sort Key: astrobin_apps_groups_group.date_updated, astrobin_apps_groups_group.id, astrobin_apps_groups_group.date_created, astrobin_apps_groups_group.creator_id, astrobin_apps_groups_group.owner_id, astrobin_apps_groups_group.name, astrobin_apps_groups_group.description, astrobin_apps_groups_group.category, astrobin_apps_groups_group.public, astrobin_apps_groups_group.moderated, astrobin_apps_groups_group.autosubmission, astrobin_apps_groups_group.forum_id
        Sort Method: external merge  Disk: 70176kB
        ->  Hash Right Join  (cost=15.69..857.39 rows=63357 width=242) (actual time=1.903..397.613 rows=201888 loops=1)
              Hash Cond: (astrobin_apps_groups_group_images.group_id = astrobin_apps_groups_group.id)
              ->  Seq Scan on astrobin_apps_groups_group_images  (cost=0.00..106.05 rows=6805 width=8) (actual time=0.024..12.510 rows=6837 loops=1)
              ->  Hash  (cost=12.31..12.31 rows=270 width=238) (actual time=1.853..1.853 rows=323 loops=1)
                    Buckets: 1024  Batches: 1  Memory Usage: 85kB
                    ->  Hash Right Join  (cost=3.63..12.31 rows=270 width=238) (actual time=0.133..1.252 rows=323 loops=1)
                          Hash Cond: (astrobin_apps_groups_group_members.group_id = astrobin_apps_groups_group.id)
                          ->  Seq Scan on astrobin_apps_groups_group_members  (cost=0.00..4.90 rows=290 width=8) (actual time=0.004..0.348 rows=333 loops=1)
                          ->  Hash  (cost=3.29..3.29 rows=27 width=234) (actual time=0.103..0.103 rows=27 loops=1)
                                Buckets: 1024  Batches: 1  Memory Usage: 7kB
                                ->  Seq Scan on astrobin_apps_groups_group  (cost=0.00..3.29 rows=27 width=234) (actual time=0.004..0.049 rows=27 loops=1)
                                      Filter: public
Total runtime: 30300.606 ms

It would be great if somebody could suggest a way to optimize this. I feel like I'm missing a really low hanging fruit.

Thanks!

Upvotes: 2

Views: 427

Answers (1)

Shaunak Sontakke
Shaunak Sontakke

Reputation: 1270

  1. What are the indexes present on astrobin_apps_groups_group and "astrobin_apps_groups_group_member, astrobin_apps_groups_group_image table?
  2. Is there any aggregate functions like SUM, COUNT used in your select? If no, then you can remove all columns from GROUP BY
  3. The plan shows most of the time is taken for sorting. If you create an index on date_updated filed with NULLS LAST with latest values first in index, then planner may use this index for sorting.
  4. For sorting, disk is getting used which is most costly affair. This is because your data collected for sorting is not fitting in memory. Try increasing the WORK_MEM - set work_mem='10MB'; SELECT.....

Upvotes: 1

Related Questions