Daniel
Daniel

Reputation: 253

Why doesn't Postgresql use index for IN query?

I have a table social_accounts with a partial index on column facebook_id where user_id IS NULL.

If I do a simple query WHERE facebook_id = '123', the index is used:

 => EXPLAIN for: SELECT "social_accounts".* FROM "social_accounts"  WHERE (user_id IS NOT NULL) AND "social_accounts"."facebook_id" = '123'
                                                  QUERY PLAN
--------------------------------------------------------------------------------------------------------------
 Index Scan using index_social_accounts_on_facebook_id on social_accounts  (cost=0.00..8.28 rows=1 width=345)
   Index Cond: ((facebook_id)::text = '123'::text)
   Filter: (user_id IS NOT NULL)

but if I do a query using IN it does not use the index:

 => EXPLAIN for: SELECT "social_accounts".* FROM "social_accounts"  WHERE (user_id IS NOT NULL) AND "social_accounts"."facebook_id" IN ('123', '456')
                                            QUERY PLAN
---------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on social_accounts  (cost=8.53..16.36 rows=2 width=345)
   Recheck Cond: ((facebook_id)::text = ANY ('{123,456}'::text[]))
   Filter: (user_id IS NOT NULL)
   ->  Bitmap Index Scan on index_social_accounts_on_facebook_id  (cost=0.00..8.52 rows=2 width=0)
         Index Cond: ((facebook_id)::text = ANY ('{123,456}'::text[]))
(5 rows)

why doesn't it use the index in the second case? any way to speed up this query?

(note that for this example I have truncated the array, and I've tested with many more elements but with same, slow, results)

Upvotes: 0

Views: 465

Answers (1)

Denis de Bernardy
Denis de Bernardy

Reputation: 78443

Actually, it is using an index. Just doing so differently.

An index scan visit rows one by one, going back and forth from one disk page to the next in random order.

A bitmap index scan starts by filtering disk pages to visit, and then visits the latter one by one sequentially. The recheck cond is because, in each page, you then need to filter out invalid rows.

For tiny numbers of rows, index scan is cheapest. For more rows, bitmap index scan becomes cheapest. For even larger numbers of rows, a seq scan eventually becomes cheapest.

Upvotes: 4

Related Questions