fasth
fasth

Reputation: 2342

PostgreSQL Bitmap Heap Scan on index is very slow but Index Only Scan is fast

I create a table with 43kk rows, populate them with values 1..200. So ~220k per each number spreaded through the table.

create table foo (id integer primary key, val bigint);
insert into foo
  select i, random() * 200 from generate_series(1, 43000000) as i;
create index val_index on foo(val);
vacuum analyze foo;
explain analyze select id from foo where val = 55;

Result: http://explain.depesz.com/s/fdsm

I expect total runtime < 1s, is it possible? I have SSD, core i5 (1,8), 4gb RAM. 9,3 Postgres.

If I use Index Only scan it works very fast:

explain analyze select val from foo where val = 55;

http://explain.depesz.com/s/7hm

But I need to select id not val so Incex Only scan is not suitable in my case.

Thanks in advance!

Additional info:

SELECT relname, relpages, reltuples::numeric, pg_size_pretty(pg_table_size(oid)) 
FROM pg_class WHERE oid='foo'::regclass;

Result:

"foo";236758;43800000;"1850 MB"

Config:

"cpu_index_tuple_cost";"0.005";""
"cpu_operator_cost";"0.0025";""
"cpu_tuple_cost";"0.01";""
"effective_cache_size";"16384";"8kB"
"max_connections";"100";""
"max_stack_depth";"2048";"kB"
"random_page_cost";"4";""
"seq_page_cost";"1";""
"shared_buffers";"16384";"8kB"
"temp_buffers";"1024";"8kB"
"work_mem";"204800";"kB"

Upvotes: 17

Views: 20565

Answers (3)

vyegorov
vyegorov

Reputation: 22865

Although you're querying only 0,5% of the table, or ~10MB worth of data (out of nearly 2GB table), values of interest are spread evenly across whole table.

You can see it in the first plan you've provided:

  • BitmapIndexScan completes in 123.172ms
  • BitmapHeapScan takes 17055.046ms.

You can try clustering your tables based on index order, which will put rows together on the same pages. On my SATA disks I have the following:

SET work_mem TO '300MB';
EXPLAIN (analyze,buffers) SELECT id FROM foo WHERE val = 55;

  Bitmap Heap Scan on foo  (...) (actual time=90.315..35091.665 rows=215022 loops=1)
    Heap Blocks: exact=140489
    Buffers: shared hit=20775 read=120306 written=24124

SET maintenance_work_mem TO '1GB';
CLUSTER foo USING val_index;
EXPLAIN (analyze,buffers) SELECT id FROM foo WHERE val = 55;

  Bitmap Heap Scan on foo  (...) (actual time=49.215..407.505 rows=215022 loops=1)
    Heap Blocks: exact=1163
    Buffers: shared read=1755

Of course, this is a one-time operation and it'll get longer bit-by-bit over the time.

Upvotes: 7

fasth
fasth

Reputation: 2342

I have got answer here: http://ask.use-the-index-luke.com/questions/235/postgresql-bitmap-heap-scan-on-index-is-very-slow-but-index-only-scan-is-fast

The trick is to use composite index for id and value:

create index val_id_index on foo(val, id);

So Index Only scan will be used, but I can select id now.

select id from foo where val = 55;

Result:

http://explain.depesz.com/s/nDt3

But this works ONLY in Postgres with version 9.2+. If you have forced to use versions below try another options.

Upvotes: 14

Pavel Stehule
Pavel Stehule

Reputation: 45795

You can try to reduce random_page_cost -- for SSD it can be 1. Second, you can increase a work_mem .. 10MB is relatively low value for current servers with gigabytes RAM. You should to recheck effective_cache_size - it can be too low too.

work_mem * max_connection * 2 + shared_buffers < RAM dedicated for Postgres
effective_cache ~ shared_buffers + file system cache

Upvotes: 1

Related Questions