Reputation: 303224
I have a PostgreSQL query that is taking longer than I'd like. I'm looking at the output of EXPLAIN ANALYZE
and it mentions a Bitmap Index Scan
. I've been searching the 'net and reading for about 10 minutes, but I cannot figure out:
Is a Bitmap Index a manufactured thing—something that I could improve if I added a real index to some column somewhere—or is it a specific type of a real index?
Here's the single table that I'm querying:
bugbot4b=> \d bug_snapshots
Table "public.bug_snapshots"
Column | Type | Modifiers
------------+-----------------------------+-----------
fixin_id | integer | not null
created_on | timestamp without time zone | not null
pain | integer | not null
status_id | integer | not null
Indexes:
"bug_snapshots_pkey" PRIMARY KEY, btree (fixin_id, created_on)
Foreign-key constraints:
"bug_snapshots_fixin_id_fkey" FOREIGN KEY (fixin_id) REFERENCES fixins(id) ON DELETE SET NULL
"bug_snapshots_status_id_fkey" FOREIGN KEY (status_id) REFERENCES statuses(id)
And here's the result of analyzing the query. Note that there are about 3k distinct fixin_id
literally within the query (elided in the below), and the table has 900k rows. Counting just those rows within the particular time range yields 15,000 rows.
EXPLAIN ANALYZE SELECT "created_on", sum("pain") AS "sum_pain" FROM "bug_snapshots"
WHERE (("fixin_id" IN (11,12,33,…,5351))
AND ("status_id" IN (2, 7, 5, 3))
AND ("created_on" >= '2013-10-08 16:42:26.994994-0700')
AND ("created_on" <= '2013-11-07 15:42:26.994994-0800')
AND ("pain" < 999))
GROUP BY "created_on"
ORDER BY "created_on";
Sort (cost=59559.33..59559.38 rows=20 width=12) (actual time=19.472..19.494 rows=30 loops=1)
Sort Key: created_on
Sort Method: quicksort Memory: 18kB
-> HashAggregate (cost=59558.64..59558.89 rows=20 width=12) (actual time=19.401..19.428 rows=30 loops=1)
-> Bitmap Heap Scan on bug_snapshots (cost=9622.42..59509.25 rows=9878 width=12) (actual time=6.849..13.420 rows=6196 loops=1)
Recheck Cond: ((fixin_id = ANY ('{11,12,33,…,5351}'::integer[])) AND (created_on >= '2013-10-08 16:42:26.994994'::timestamp without time zone) AND (created_on <= '2013-11-07 15:42:26.994994'::timestamp without time zone))
Filter: ((pain < 999) AND (status_id = ANY ('{2,7,5,3}'::integer[])))
-> Bitmap Index Scan on bug_snapshots_pkey (cost=0.00..9619.95 rows=11172 width=0) (actual time=6.801..6.801 rows=6196 loops=1)
Index Cond: ((fixin_id = ANY ('{11,12,33,…,5351}'::integer[])) AND (created_on >= '2013-10-08 16:42:26.994994'::timestamp without time zone) AND (created_on <= '2013-11-07 15:42:26.994994'::timestamp without time zone))
Total runtime: 19.646 ms
(10 rows)
Is the result of ANALYZE telling me that I need to add an index to the fixin_id (and/or other fields) to improve the speed? Or that this is just "slow" because of its size?
Upvotes: 3
Views: 1435
Reputation: 656666
Postgres doesn't have "bitmap indexes" per se. A "bitmap index scan" is an index access method that's allowed for certain index types (including default btree indexes). It's especially useful to combine multiple index lookups. The manual:
An index access method can support "plain" index scans, "bitmap" index scans, or both.
You can disable bitmap-scanning (for debugging purposes only!) by setting:
SET enable_bitmapscan = FALSE;
With long lists, joining to a derived table is often faster than a lengthy IN
expression. You can use VALUES
or unnest()
for that purpose. Or even a temporary table, possibly with indexes. See:
SELECT created_on, sum(pain) AS sum_pain
FROM unnest('{11,12,33,…,5351}'::int[]) AS f(fixin_id)
JOIN bug_snapshots USING (fixin_id)
WHERE status_id IN (2, 7, 5, 3)
AND created_on >= '2013-10-08 16:42:26.994994-0700'::timestamptz
AND created_on <= '2013-11-07 15:42:26.994994-0800'::timestamptz
AND pain < 999
GROUP BY created_on
ORDER BY created_on;
A partial multicolumn index would probably help (a lot). That depends on details like data distribution, load, stable query conditions, etc. Most importantly, the selectivity of WHERE
expressions: a partial index typically only makes sense if many or most rows are excluded. Something like:
CREATE INDEX bug_snapshots_part_idx ON bug_snapshots (fixin_id, created_on, pain)
WHERE status_id IN (2, 7, 5, 3)
AND pain < 999;
The sequence of columns in the index matters. That is also true for your primary key, btw, which implements another multicolumn index. See:
Performance testing in fiddles is hardly reliable. Run your own tests! Also there were many improvements to Postgres since 2013, when this answer was written!
timestamp [without time zone]
One more thing: bug_snapshots.created_on
is type timestamp
. Timestamps are interpreted according to your current time zone setting.
But in the query you try to compare to literals with time zone (timestamptz
). This would work with an explicit cast:
WHERE created_on >= '2013-10-08 16:42:26.994994-0700'::timestamptz
Your literal would be cast do timestamptz
and translated to your local time zone accordingly. However, since you don't provide the data type, Postgres casts your literal to the matching type timestamp
(not timestamptz
) ignoring the time zone offset. Most probably not your intention!
Consider this test:
SELECT min(created_on), max(created_on)
FROM bug_snapshots
WHERE created_on >= '2013-10-08 16:42:26.994994-0700'
AND created_on <= '2013-11-07 15:42:26.994994-0800'
See:
Upvotes: 4