Alejandro Escobar
Alejandro Escobar

Reputation: 174

Postgres similar select query in two tables taking considerably longer in one table than in the other

I'm making two queries to a contacts table (1854453 total records) and a notes table (956467 total records). Although their query plans are very similar, the notes table query is taking considerably longer to process while the contacts query is really fast. Below are the queries with the query plan:

Contacts query (0.9 ms):

Contact Load (0.9ms)  SELECT "contacts".* FROM "contacts" WHERE "contacts"."discarded_at" IS NULL AND "contacts"."firm_id" = $1 ORDER BY id DESC LIMIT $2  [["firm_id", 1], ["LIMIT", 2]]
=> EXPLAIN (ANALYZE,BUFFERS) SELECT "contacts".* FROM "contacts" WHERE "contacts"."discarded_at" IS NULL AND "contacts"."firm_id" = 1 ORDER BY id DESC LIMIT 2;
                                                 

                                               QUERY PLAN                                                                    
--------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.43..11.27 rows=2 width=991) (actual time=5.407..5.412 rows=2 loops=1)
   Buffers: shared hit=7 read=70
   ->  Index Scan Backward using contacts_pkey on contacts  (cost=0.43..484798.76 rows=89438 width=991) (actual time=5.406..5.410 rows=2 loops=1)
         Filter: ((discarded_at IS NULL) AND (firm_id = 1))
         Rows Removed by Filter: 86
         Buffers: shared hit=7 read=70
 Planning Time: 0.271 ms
 Execution Time: 5.440 ms

Notes query (294.5ms):

Note Load (294.5ms)  SELECT "notes".* FROM "notes" WHERE "notes"."firm_id" = $1 ORDER BY id DESC LIMIT $2  [["firm_id", 1], ["LIMIT", 2]]
=> EXPLAIN (ANALYZE,BUFFERS) SELECT "notes".* FROM "notes" WHERE "notes"."firm_id" = 1 ORDER BY id DESC LIMIT 2
                   

                                            QUERY PLAN                                                                    
-------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.42..0.88 rows=2 width=390) (actual time=387.278..387.280 rows=2 loops=1)
   Buffers: shared hit=29871 read=36815
   ->  Index Scan Backward using notes_pkey on notes  (cost=0.42..115349.39 rows=502862 width=390) (actual time=387.277..387.278 rows=2 loops=1)
         Filter: (firm_id = 1)
         Rows Removed by Filter: 271557
         Buffers: shared hit=29871 read=36815
 Planning Time: 5.389 ms
 Execution Time: 387.322 ms

  

Both tables have an index on the firm_id and the contacts also have an index in discarded_at columns.

Is the difference in query time because of the number of rows that postgres has to check? if not, what could account for that difference? Let me know if any other information is necessary.

Upvotes: 0

Views: 61

Answers (1)

Laurenz Albe
Laurenz Albe

Reputation: 246788

In both cases PostgreSQL reads the rows in index order to avoid an explicit sort, and keeps discarding rows that don't meet the filter condition until it has found two rows that match.

The difference is that in the first case the goal is reached afzer discarding only 86 rows, while in the second case almost 300000 rows have to be scanned.

Upvotes: 2

Related Questions