Nathan Wienert
Nathan Wienert

Reputation: 1623

Slow Query with lots of joins

Apologies in advance for the mixed formatting, I broke it out a bit so it would look better.

I usually am able to determine the slow parts of queries using the cost, but in this one it seems really spread out and nothing too obvious. For some reason when I sort it by random() it slows down from a few milliseconds to a 8 seconds:

EXPLAIN for:
  SELECT songs.*
  FROM "songs"
    INNER JOIN "posts" ON "posts"."id" = "songs"."post_id"
    INNER JOIN broadcasts on broadcasts.song_id = songs.id
    inner join stations as ss on ss.id = broadcasts.station_id
    inner join artists on artists.station_slug = ss.slug
    inner join artists_genres on artists_genres.artist_id = artists.id
    inner join genres on genres.id = artists_genres.genre_id
    inner join broadcasts bb on bb.song_id = songs.id
    inner join stations sss on sss.id = bb.station_id
    inner join blogs b on b.station_slug = sss.slug
    inner join blogs_genres on blogs_genres.blog_id = b.id
    inner join genres gg on gg.id = blogs_genres.genre_id
    INNER JOIN "stations" on "stations"."blog_id" = "songs"."blog_id"
    INNER JOIN blogs on blogs.id = posts.blog_id
  WHERE "songs"."working" = 't'
    AND "songs"."processed" = 't'
    AND (genres.id = 20 and gg.id = 20)
    AND (NOT EXISTS(
      SELECT NULL
      FROM authors
      WHERE authors.song_id = songs.id
      AND authors.role IN ('remixer', 'mashup')
    ))
    AND (songs.source != 'youtube')
    AND (songs.seconds < 600)
    AND (songs.matching_id = songs.id)
    ORDER BY random() desc
                                                                                                             QUERY PLAN
 ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Sort  (cost=3770.97..3770.98 rows=1 width=2132)
    Sort Key: (random())
    ->  Nested Loop  (cost=3142.28..3770.96 rows=1 width=2132)
          Join Filter: ((b.station_slug)::text = (sss.slug)::text)
          ->  Nested Loop Anti Join  (cost=3142.28..3762.68 rows=1 width=2150)
                ->  Nested Loop  (cost=2526.24..3135.35 rows=1 width=2150)
                      Join Filter: (posts.blog_id = blogs.id)
                      ->  Nested Loop  (cost=2526.24..3128.65 rows=1 width=2127)
                            ->  Nested Loop  (cost=2526.24..3127.39 rows=1 width=2131)
                                  ->  Hash Join  (cost=2515.19..3112.87 rows=1 width=2113)
                                        Hash Cond: (stations.blog_id = songs.blog_id)
                                        ->  Seq Scan on stations  (cost=0.00..475.49 rows=19549 width=40)
                                        ->  Hash  (cost=2515.15..2515.15 rows=3 width=2077)
                                              ->  Nested Loop  (cost=1265.66..2515.15 rows=3 width=2077)
                                                    ->  Nested Loop  (cost=1265.66..2506.44 rows=1 width=2077)
                                                          ->  Nested Loop  (cost=1265.66..2499.97 rows=1 width=1809)
                                                                ->  Nested Loop  (cost=1265.66..2498.71 rows=1 width=1813)
                                                                      ->  Hash Join  (cost=1265.66..2357.90 rows=277 width=8)
                                                                            Hash Cond: (broadcasts.station_id = ss.id)
                                                                            ->  Seq Scan on broadcasts  (cost=0.00..895.98 rows=51598 width=8)
                                                                            ->  Hash  (cost=1264.35..1264.35 rows=105 width=8)
                                                                                  ->  Hash Join  (cost=714.50..1264.35 rows=105 width=8)
                                                                                        Hash Cond: ((ss.slug)::text = (artists.station_slug)::text)
                                                                                        ->  Seq Scan on stations ss  (cost=0.00..475.49 rows=19549 width=18)
                                                                                        ->  Hash  (cost=713.20..713.20 rows=104 width=18)
                                                                                              ->  Nested Loop  (cost=5.06..713.20 rows=104 width=18)
                                                                                                    ->  Bitmap Heap Scan on artists_genres  (cost=5.06..99.85 rows=104 width=8)
                                                                                                          Recheck Cond: (genre_id = 20)
                                                                                                          ->  Bitmap Index Scan on index_artists_genres_on_genre_id_and_artist_id  (cost=0.00..5.03 rows=104 width=0)
                                                                                                                Index Cond: (genre_id = 20)
                                                                                                    ->  Index Scan using artists_pkey on artists  (cost=0.00..5.89 rows=1 width=18)
                                                                                                          Index Cond: (artists.id = artists_genres.artist_id)
                                                                      ->  Index Scan using index_songs_on_shared_id on songs  (cost=0.00..0.50 rows=1 width=1805)
                                                                            Index Cond: (songs.matching_id = broadcasts.song_id)
                                                                            Filter: (songs.working AND songs.processed AND ((songs.source)::text <> 'youtube'::text) AND (songs.seconds < 600) AND (songs.id = songs.matching_id))
                                                                ->  Seq Scan on genres  (cost=0.00..1.25 rows=1 width=4)
                                                                      Filter: (genres.id = 20)
                                                          ->  Index Scan using posts_pkey on posts  (cost=0.00..6.46 rows=1 width=268)
                                                                Index Cond: (posts.id = songs.post_id)
                                                    ->  Index Scan using index_songs_stations_on_song_id_and_station_id on broadcasts bb  (cost=0.00..8.67 rows=3 width=8)
                                                          Index Cond: (bb.song_id = broadcasts.song_id)
                                  ->  Hash Join  (cost=11.05..14.39 rows=13 width=18)
                                        Hash Cond: (blogs_genres.blog_id = b.id)
                                        ->  Bitmap Heap Scan on blogs_genres  (cost=4.35..7.51 rows=13 width=8)
                                              Recheck Cond: (genre_id = 20)
                                              ->  Bitmap Index Scan on index_blogs_genres_on_genre_id_and_blog_id  (cost=0.00..4.35 rows=13 width=0)
                                                    Index Cond: (genre_id = 20)
                                        ->  Hash  (cost=5.20..5.20 rows=120 width=18)
                                              ->  Seq Scan on blogs b  (cost=0.00..5.20 rows=120 width=18)
                            ->  Seq Scan on genres gg  (cost=0.00..1.25 rows=1 width=4)
                                  Filter: (gg.id = 20)
                      ->  Seq Scan on blogs  (cost=0.00..5.20 rows=120 width=31)
                ->  Bitmap Heap Scan on authors  (cost=616.04..623.56 rows=2 width=4)
                      Recheck Cond: (authors.song_id = songs.id)
                      Filter: ((authors.role)::text = ANY ('{remixer,mashup}'::text[]))
                      ->  Bitmap Index Scan on index_authors_on_artist_id_and_song_id_and_role  (cost=0.00..616.04 rows=2 width=0)
                            Index Cond: (authors.song_id = songs.id)
          ->  Index Scan using stations_pkey on stations sss  (cost=0.00..8.27 rows=1 width=18)
                Index Cond: (sss.id = bb.station_id)

Any idea whats causing the slowdown? Using Postgres 9.1 on Mac.

Upvotes: 0

Views: 123

Answers (2)

Chris Travers
Chris Travers

Reputation: 26454

The question as I understand it is why order by random() would have a drastic effect on query speed. I don't know for sure and can't know without EXPLAIN ANALYSE for both versions of the query, but I suspect the issue has to do with handling of volatile functions. Volatile functions cannot be inlined and in this case it is likely that PostgreSQL is going back and adding a random() value to every row, and then sorting after the tables have already been scanned etc. This may be your problem.

A better approach might be to add the random() to the column list and then sort by alias or column number. This would allow random() to be calculated much earlier in the process when it could be done more efficiently.

Upvotes: 0

Zeb Rawnsley
Zeb Rawnsley

Reputation: 2220

You want to ensure you have built indexes for all columns that you're joining to. You should also try to avoid nested queries (from my experience) so you may need to re-write that NOT EXISTS() section.

Upvotes: 1

Related Questions