Sha
Sha

Reputation: 1181

What is the difference between Index-Only and Bitmap Index Scan in PostgreSQL?

In my query, I just want to call data with exact where conditions. These where conditions were created in index. Bu the explain shows bit index-scan. I couldn't understand why.

My query looks like below:

Select 
r.spend,
r.date,
...
from metadata m 
inner join 
report r
on m.org_id = r.org_id and m.country_or_region = r.country_or_region and m.campaign_id = r.campaign_id and m.keyword_id = r.keyword_id  
where r.org_id = 1 and m.keyword_type = 'KEYWORD'
offset 0  limit 20 

Indexes:

Metadata(org_id, keyword_type, country_or_region, campaign_id, keyword_id);
Report(org_id, country_or_region, campaign_id, keyword_id, date);

Explain Analyze:

"Limit  (cost=811883.21..910327.87 rows=20 width=8) (actual time=18120.268..18235.831 rows=20 loops=1)"
"  ->  Gather  (cost=811883.21..2702020.67 rows=384 width=8) (actual time=18120.267..18235.791 rows=20 loops=1)"
"        Workers Planned: 2"
"        Workers Launched: 2"
"        ->  Parallel Hash Join  (cost=810883.21..2700982.27 rows=160 width=8) (actual time=18103.440..18103.496 rows=14 loops=3)"
"              Hash Cond: (((r.country_or_region)::text = (m.country_or_region)::text) AND (r.campaign_id = m.campaign_id) AND (r.keyword_id = m.keyword_id))"
"              ->  Parallel Bitmap Heap Scan on report r  (cost=260773.11..2051875.83 rows=3939599 width=35) (actual time=552.601..8532.962 rows=3162553 loops=3)"
"                    Recheck Cond: (org_id = 479360)"
"                    Rows Removed by Index Recheck: 21"
"                    Heap Blocks: exact=20484 lossy=84350"
"                    ->  Bitmap Index Scan on idx_kr_org_date_camp  (cost=0.00..258409.35 rows=9455038 width=0) (actual time=539.329..539.329 rows=9487660 loops=1)"
"                          Index Cond: (org_id = 479360)"
"              ->  Parallel Hash  (cost=527278.08..527278.08 rows=938173 width=26) (actual time=7425.062..7425.062 rows=727133 loops=3)"
"                    Buckets: 65536  Batches: 64  Memory Usage: 2656kB"
"                    ->  Parallel Bitmap Heap Scan on metadata m  (cost=88007.61..527278.08 rows=938173 width=26) (actual time=1007.028..7119.233 rows=727133 loops=3)"
"                          Recheck Cond: ((org_id = 479360) AND ((keyword_type)::text = 'KEYWORD'::text))"
"                          Rows Removed by Index Recheck: 3"
"                          Heap Blocks: exact=14585 lossy=11054"
"                          ->  Bitmap Index Scan on idx_primary  (cost=0.00..87444.71 rows=2251615 width=0) (actual time=1014.631..1014.631 rows=2181399 loops=1)"
"                                Index Cond: ((org_id = 479360) AND ((keyword_type)::text = 'KEYWORD'::text))"
"Planning Time: 0.492 ms"
"Execution Time: 18235.879 ms"

In here, I just want to call 20 items. It should be more effective?

Upvotes: 8

Views: 5231

Answers (1)

richyen
richyen

Reputation: 9968

The Bitmap Index Scan happens when the result set will have a high selectivity rate with respect to the search conditions (i.e., there is a high percentage of rows that satisfy the search criteria). In this case, the planner will plan to scan the entire index, forming a bitmap of which pages on disk to pull the data out from (which happens during the Bitmap Heap Scan step). This is better than a Sequential Scan because it only scans the relevant pages on disk, skipping the pages that it knows relevant data does not exist. Depending on the statistics available to the optimizer, it may not be advantageous to do an Index Scan or an Index-Only Scan, but it is still better than a Sequential Scan.

To complete the answer to the question, an Index-Only Scan is a scan of the index that will pull the relevant data without having to visit the actual table. This is because the relevant data is already in the index. Take, for example, this table:

postgres=# create table foo (id int primary key, name text);
CREATE TABLE
postgres=# insert into foo values (generate_series(1,1000000),'foo');
INSERT 0 1000000

There is an index on the id column of this table, and suppose we call the following query:

postgres=# EXPLAIN ANALYZE SELECT * FROM foo WHERE id < 100;
                                                    QUERY PLAN                                                    
------------------------------------------------------------------------------------------------------------------
 Index Scan using foo_pkey on foo  (cost=0.42..10.25 rows=104 width=8) (actual time=0.012..1.027 rows=99 loops=1)
   Index Cond: (id < 100)
 Planning Time: 0.190 ms
 Execution Time: 2.067 ms
(4 rows)

This query results in an Index scan because it scans the index for the rows that have id < 100, and then visits the actual table on disk to pull the other columns included in the * portion of the SELECT query.

However, suppose we call the following query (notice SELECT id instead of SELECT *):

postgres=# EXPLAIN ANALYZE SELECT id FROM foo WHERE id < 100;
                                                      QUERY PLAN                                                       
-----------------------------------------------------------------------------------------------------------------------
 Index Only Scan using foo_pkey on foo  (cost=0.42..10.25 rows=104 width=4) (actual time=0.019..0.996 rows=99 loops=1)
   Index Cond: (id < 100)
   Heap Fetches: 99
 Planning Time: 0.098 ms
 Execution Time: 1.980 ms
(5 rows)

This results in an Index-only scan because only the id column is requested, and that is included (naturally) in the index, so there's no need to visit the actual table on disk to retrieve anything else. This saves time, but its occurrence is very limited.

To answer your question about limiting to 20 results, the limiting occurs after the Bitmap Index Scan has occurred, so the runtime will still be the same whether you limit to 20, 40, or some other value. In the case of an Index/Index-Only Scan, the executor will stop scanning after it has acquired enough rows as specified by the LIMIT clause. In your case, with the Bitmap Heap Scan, this isn’t possible

Upvotes: 16

Related Questions