Why cost is increased by adding indexes?

Question

I'm using postgresql 9.4.6.

There are the following entities:

CREATE TABLE user (id CHARACTER VARYING NOT NULL PRIMARY KEY);

CREATE TABLE group (id CHARACTER VARYING NOT NULL PRIMARY KEY);

CREATE TABLE group_member (
       id CHARACTER VARYING NOT NULL PRIMARY KEY,
       gid CHARACTER VARYING REFERENCES group(id),
       uid CHARACTER VARYING REFERENCES user(id));

I analyze that query:

explain analyze select x2."gid" from "group_member" x2 where x2."uid" = 'a1';

I have several results. Before each result I flushed OS-caches and restarted postgres:

# /etc/init.d/postgresql stop
# sync
# echo 3 > /proc/sys/vm/drop_caches
# /etc/init.d/postgresql start

The results of analyzing are:

1) cost=4.17..11.28 with indexes:

create index "group_member_gid_idx" on "group_member" ("gid");
create index "group_member_uid_idx" on "group_member" ("uid");

 Bitmap Heap Scan on group_member x2  (cost=4.17..11.28 rows=3 width=32) (actual time=0.021..0.021 rows=0 loops=1)
   Recheck Cond: ((uid)::text = 'a1'::text)
   ->  Bitmap Index Scan on group_member_uid_idx  (cost=0.00..4.17 rows=3 width=0) (actual time=0.005..0.005 rows=0 loops=1)
         Index Cond: ((uid)::text = 'a1'::text)
 Planning time: 28.641 ms
 Execution time: 0.359 ms

2) cost=7.97..15.08 with indexes:

create unique index "group_member_gid_uid_idx" on "group_member" ("gid","uid");

 Bitmap Heap Scan on group_member x2  (cost=7.97..15.08 rows=3 width=32) (actual time=0.013..0.013 rows=0 loops=1)
   Recheck Cond: ((uid)::text = 'a1'::text)
   ->  Bitmap Index Scan on group_member_gid_uid_idx  (cost=0.00..7.97 rows=3 width=0) (actual time=0.006..0.006 rows=0 loops=1)
         Index Cond: ((uid)::text = 'a1'::text)
 Planning time: 0.132 ms
 Execution time: 0.047 ms

3) cost=0.00..16.38 without any indexes:

 Seq Scan on group_member x2  (cost=0.00..16.38 rows=3 width=32) (actual time=0.002..0.002 rows=0 loops=1)
   Filter: ((uid)::text = 'a1'::text)
 Planning time: 42.599 ms
 Execution time: 0.402 ms

Is a result #3 more effective? And why?

EDIT There will be many rows in tables (group, user, group_members) in practice. About > 1 Million.

Gordon Linoff · Accepted Answer

When analyzing queries, the costs and query plans on small data sets are not generally not a reliable guide to performance on larger data sets. And, SQL is more concerned with larger data sets than with trivially small ones.

The reading of data from disk is often the driving factor in query performance. The main purpose of using an index is to reduce the number of data pages being read. If all the data in the table fits on a single data page, then there isn't much opportunity for reducing the number of page reads: It takes the same amount of time to read one page, whether the page has one record or 100 records. (Reading through a page to find the right record also incurs overhead, whereas an index would identify the specific record on the page.)

Indexes incur overhead, but typically much, much less than reading a data page. The index itself needs to be read into memory -- so that means that two pages are being read into memory rather than one. One could argue that for tables that fit on one or two pages, the use of an index is probably not a big advantage.

Although using the index (in this case) does take longer, differences in performance measured in fractions of a millisecond are generally not germane to most database tasks. If you want to see the index do its work, put 100,000 rows in the table and run the same tests. You'll see that the version without the index scales roughly in proportion to the amount of data in the table; the version with the index is relatively constant (well, actually scaling more like the log of the number of records in the table).

Why cost is increased by adding indexes?

Answers (1)

Related Questions