Postgres select performance with LIMIT 1

Question

I have a table with about 50 million records in PostgreSQL. Trying to select a post with most "likes" filtering by a "tag". Both fields have b-tree index. For "love" tag I get

EXPLAIN analyse select user_id from posts where tags @> array['love'] order by likes desc nulls last limit 12

Limit  (cost=0.57..218.52 rows=12 width=12) (actual time=2.658..14.243 rows=12 loops=1)
  ->  Index Scan using idx_likes on posts  (cost=0.57..55759782.55 rows=3070010 width=12) (actual time=2.657..14.239 rows=12 loops=1)
        Filter: (tags @> '{love}'::text[])
        Rows Removed by Filter: 10584
Planning time: 0.297 ms
Execution time: 14.276 ms

14 ms is great, but if I try to get it for "tamir", it suddenly becomes over 22 seconds!! Obviously query planner is doing something wrong.

EXPLAIN analyse select user_id from posts where tags @> array['tamir'] order by likes desc nulls last limit 12

Limit  (cost=0.57..25747.73 rows=12 width=12) (actual time=17552.406..22839.503 rows=12 loops=1)
  ->  Index Scan using idx_likes on posts  (cost=0.57..55759782.55 rows=25988 width=12) (actual time=17552.405..22839.484 rows=12 loops=1)
        Filter: (tags @> '{tamir}'::text[])
        Rows Removed by Filter: 11785083
Planning time: 0.253 ms
Execution time: 22839.569 ms

After reading the article I've added "user_id" to the ORDER BY and "tamir" is blazingly fast, 0.2ms! Now it's doing sorts and Bitmap Heap Scan instead of Index scan.

EXPLAIN analyse select user_id from posts where tags @> array['tamir'] order by likes desc nulls last, user_id limit 12

Limit  (cost=101566.17..101566.20 rows=12 width=12) (actual time=0.237..0.238 rows=12 loops=1)
  ->  Sort  (cost=101566.17..101631.14 rows=25988 width=12) (actual time=0.237..0.237 rows=12 loops=1)
        Sort Key: likes DESC NULLS LAST, user_id
        Sort Method: top-N heapsort  Memory: 25kB
        ->  Bitmap Heap Scan on posts  (cost=265.40..100970.40 rows=25988 width=12) (actual time=0.074..0.214 rows=126 loops=1)
              Recheck Cond: (tags @> '{tamir}'::text[])
              Heap Blocks: exact=44
              ->  Bitmap Index Scan on idx_tags  (cost=0.00..258.91 rows=25988 width=0) (actual time=0.056..0.056 rows=126 loops=1)
                    Index Cond: (tags @> '{tamir}'::text[])
Planning time: 0.287 ms
Execution time: 0.277 ms

But what happens to "love"? Now it goes from 14 ms to 2.3 seconds...

EXPLAIN analyse select user_id from posts where tags @> array['love'] order by likes desc nulls last, user_id limit 12

Limit  (cost=7347142.18..7347142.21 rows=12 width=12) (actual time=2360.784..2360.786 rows=12 loops=1)
  ->  Sort  (cost=7347142.18..7354817.20 rows=3070010 width=12) (actual time=2360.783..2360.784 rows=12 loops=1)
        Sort Key: likes DESC NULLS LAST, user_id
        Sort Method: top-N heapsort  Memory: 25kB
        ->  Bitmap Heap Scan on posts  (cost=28316.58..7276762.77 rows=3070010 width=12) (actual time=595.274..2171.571 rows=1517679 loops=1)
              Recheck Cond: (tags @> '{love}'::text[])
              Heap Blocks: exact=642705
              ->  Bitmap Index Scan on idx_tags  (cost=0.00..27549.08 rows=3070010 width=0) (actual time=367.080..367.080 rows=1517679 loops=1)
                    Index Cond: (tags @> '{love}'::text[])
Planning time: 0.226 ms
Execution time: 2360.863 ms

Can somebody shed some light on why this is happening and what would the fix.

Update

"tag" field had gin index, not b-tree, just typo.

Tometzky · Accepted Answer

B-tree indexes are not very useful for searching of element in array field. You should remove b-tree index from tags field and use gin index instead:

drop index idx_tags;
create index idx_tags using gin(tags);

And don't add order by user_id — this sabotages possibility to use your idx_likes for ordering when there's a lot of rows with the tag you search for.

Also likes field should probably be not null default 0.

Postgres select performance with LIMIT 1

Answers (1)

Related Questions