Surendra Sharma
Surendra Sharma

Reputation: 33

Postgres not using index to sort data

I have a table learners which has around 3.2 million rows. This table contains user related information like name and email. I need to optimize some queries that uses order by on some column. So for testing I have created a temp_learners table, with 0.8 million rows. I have created two indexes on this table:

CREATE UNIQUE INDEX "temp_learners_companyId_userId_idx"
  ON temp_learners ("companyId" ASC, "userId" ASC, "learnerUserName" ASC, "learnerEmailId" ASC);

and

CREATE INDEX temp_learners_company_name_email_index
  ON temp_learners ("companyId", "learnerUserName", "learnerEmailId");

The second index is just for testing. Now When I run this query:

SELECT *
FROM temp_learners
WHERE "companyId" = 909666665757230431 AND "userId" IN (
                                                        4990609084216745771,
                                                        4990610022492247987,
                                                        4990609742667096366,
                                                        4990609476136523663,
                                                        5451985767018841230,
                                                        5451985767078553638,
                                                        5270390122102920730,
                                                        4763688819142650938,
                                                        5056979692501246449,
                                                        5279569274741647114,
                                                        5031660827132289520,
                                                        4862889373349389098,
                                                        5299864070077160421,
                                                        4740222596778406913,
                                                        5320170488686569878,
                                                        5270367618320474818,
                                                        5320170488587895729,
                                                        5228888485293847415,
                                                        4778050469432720821,
                                                        5270392314970177842,
                                                        4849087862439244546,
                                                        5270392117430427860,
                                                        5270351184072717902,
                                                        5330263074228870897,
                                                        4763688829301614114,
                                                        4763684609695916489,
                                                        5270390232949727716
  ) ORDER BY "learnerUserName","learnerEmailId";

The query plan used by db is this:

Sort  (cost=138.75..138.76 rows=4 width=1581) (actual time=0.169..0.171 rows=27 loops=1)
"  Sort Key: ""learnerUserName"", ""learnerEmailId"""
  Sort Method: quicksort  Memory: 73kB
  ->  Index Scan using "temp_learners_companyId_userId_idx" on temp_learners  (cost=0.55..138.71 rows=4 width=1581) (actual time=0.018..0.112 rows=27 loops=1)
"        Index Cond: ((""companyId"" = '909666665757230431'::bigint) AND (""userId"" = ANY ('{4990609084216745771,4990610022492247987,4990609742667096366,4990609476136523663,5451985767018841230,5451985767078553638,5270390122102920730,4763688819142650938,5056979692501246449,5279569274741647114,5031660827132289520,4862889373349389098,5299864070077160421,4740222596778406913,5320170488686569878,5270367618320474818,5320170488587895729,5228888485293847415,4778050469432720821,5270392314970177842,4849087862439244546,5270392117430427860,5270351184072717902,5330263074228870897,4763688829301614114,4763684609695916489,5270390232949727716}'::bigint[])))"
Planning time: 0.116 ms
Execution time: 0.191 ms

In this it does not sort on indexs. But when I run this query

SELECT *
FROM temp_learners
WHERE "companyId" = 909666665757230431
   ORDER BY "learnerUserName","learnerEmailId" limit 500;

This query uses indexs on sorting.

Limit  (cost=0.42..1360.05 rows=500 width=1581) (actual time=0.018..0.477 rows=500 loops=1)
  ->  Index Scan using temp_learners_company_name_email_index on temp_learners  (cost=0.42..332639.30 rows=122327 width=1581) (actual time=0.018..0.442 rows=500 loops=1)
        Index Cond: ("companyId" = '909666665757230431'::bigint)
Planning time: 0.093 ms
Execution time: 0.513 ms

What I am not able to understand is why postgre does not uses index in first query? Also, I want to clear out that the normal use case of this table learner is to join with other tables. So the first query I written is more similar to joins equation. So for example,

SELECT *
FROM temp_learners AS l
INNER JOIN entity_learners_basic AS elb
ON l."companyId" = elb."companyId" AND l."userId" = elb."userId"
WHERE l."companyId" = 909666665757230431 AND elb."gameId" = 1050403501267716928
ORDER BY "learnerUserName", "learnerEmailId" limit 5000;

Even after correcting indexes the query plan does not indexes for sorting.

QUERY PLAN
Limit  (cost=3785.11..3785.22 rows=44 width=1767) (actual time=163.554..173.135 rows=5000 loops=1)
  ->  Sort  (cost=3785.11..3785.22 rows=44 width=1767) (actual time=163.553..172.791 rows=5000 loops=1)
"        Sort Key: l.""learnerUserName"", l.""learnerEmailId"""
        Sort Method: external merge  Disk: 35416kB
        ->  Nested Loop  (cost=1.12..3783.91 rows=44 width=1767) (actual time=0.019..63.743 rows=21195 loops=1)
              ->  Index Scan using primary_index__entity_learners_basic on entity_learners_basic elb  (cost=0.57..1109.79 rows=314 width=186) (actual time=0.010..6.221 rows=21195 loops=1)
                    Index Cond: (("companyId" = '909666665757230431'::bigint) AND ("gameId" = '1050403501267716928'::bigint))
              ->  Index Scan using "temp_learners_companyId_userId_idx" on temp_learners l  (cost=0.55..8.51 rows=1 width=1581) (actual time=0.002..0.002 rows=1 loops=21195)
                    Index Cond: (("companyId" = '909666665757230431'::bigint) AND ("userId" = elb."userId"))
Planning time: 0.309 ms
Execution time: 178.422 ms

Does Postgres not use indexes when joining and ordering data?

Upvotes: 2

Views: 10219

Answers (2)

jjanes
jjanes

Reputation: 44202

PostgreSQL chooses the plan it thinks will be faster. Using the index that provides rows in the correct order means using a much less selective index, so it doesn't think that will be faster overall.

If you want to force PostgreSQL into believing that sorting is the worst thing in the world, you could set enable_sort=off. If it still sorts after that, then you know PostgreSQL doesn't have the right indexes to avoid sorting, as opposed to just thinking they will not actually be faster.

Upvotes: 5

Laurenz Albe
Laurenz Albe

Reputation: 246633

PostgreSQL could use an index on ("companyId", "learnerUserName", "learnerEmailId") for your first query, but the additional IN condition reduces the number of result rows to an estimated 4 rows, which means that the sort won't cost anything at all. So it chooses to use the index that can support the IN condition.

Rows returned with that index won't be in the correct order automatically, because

  1. you specified DESC for the last index column, but ASC to the preceding one

  2. you have more than one element in the IN list.

Without the IN condition, enough rows are returned, so that PostgreSQL thinks that it is cheaper to order by the index and filter out rows that don't match the condition.

With your first query, it is impossible to have an index that supports both the IN list in the WHERE condition and the ORDER BY clause, so PostgreSQL has to make a choice.

Upvotes: 3

Related Questions