What is the reason of Seq Scan on a table?

Question

I have a sample calendar like application which stores events, their repetitions and event repeat rule. Here is database schema in PostgreSQL:

CREATE TABLE event
(
  id serial NOT NULL,
  title character varying(2000) NOT NULL,
  description character varying(2000) DEFAULT NULL::character varying,
  location character varying(2000) DEFAULT NULL::character varying,
  CONSTRAINT pk_event_id PRIMARY KEY (id)
)

CREATE TABLE event_repeat_rule
(
  id serial NOT NULL,
  event_id integer NOT NULL,
  start_date bigint NOT NULL,
  end_date bigint,
  count integer,
  repeat_type repeat_t NOT NULL,
  fixed_interval integer NOT NULL,
  day_of_month integer[] NOT NULL,
  day_of_week integer[] NOT NULL,
  week_of_month week_of_month_t[] NOT NULL,
  month_of_year integer[] NOT NULL,
  CONSTRAINT pk_event_repeat_rule PRIMARY KEY (id),
  CONSTRAINT fk_event_repeat_rule FOREIGN KEY (event_id)
      REFERENCES event (id) MATCH SIMPLE
      ON UPDATE CASCADE ON DELETE CASCADE,
  CONSTRAINT uq_event_repeat_rule_event_id UNIQUE (event_id)
)

-- each event can be labeled with multiple tags. Tag table is not shown here.
CREATE TABLE event_tag
(
  id serial NOT NULL,
  event_id integer NOT NULL,
  tag_id integer NOT NULL,
  CONSTRAINT pk_event_tag_id PRIMARY KEY (id),
  CONSTRAINT fk_event_tag_event_id FOREIGN KEY (event_id)
      REFERENCES event (id) MATCH SIMPLE
      ON UPDATE CASCADE ON DELETE CASCADE,
  CONSTRAINT fk_event_tag_tag_id FOREIGN KEY (tag_id)
      REFERENCES tag (id) MATCH SIMPLE
      ON UPDATE CASCADE ON DELETE CASCADE,
  CONSTRAINT uq_evnet_tag_event_id_tag_id UNIQUE (event_id, tag_id)
)

CREATE INDEX idx_event_tag_tag_id
  ON event_tag
  USING btree
  (tag_id);

CREATE TABLE event_time
(
  id serial NOT NULL,
  event_id integer NOT NULL,
  start_time bigint NOT NULL,
  end_time bigint,
  CONSTRAINT pk_event_time_id PRIMARY KEY (id),
  CONSTRAINT fk_event_time_event_id FOREIGN KEY (event_id)
      REFERENCES event (id) MATCH SIMPLE
      ON UPDATE CASCADE ON DELETE CASCADE
)

CREATE INDEX idx_event_time_event_id_start_time_end_time
  ON event_time
  USING btree
  (event_id, start_time, end_time);

The overall description of schema: Each event has a repeat rule or not. each event can be labeled with tags.(many to many relation with Tag table). and all times of each event (single or repetitive) is in event_time table so the relation is 1 to many. There is an index on (event_id, start_time, end_time) in event_time table.

I query this schema based on tag_id and start_time. This is my query:

SELECT * FROM 
event_time 
JOIN event ON event_time.event_id = event.id 
JOIN event_tag ON event_tag.event_id = event.id 
LEFT OUTER JOIN event_repeat_rule ON event.id = event_repeat_rule.event_id 
WHERE event_tag.tag_id = 1 
AND event_time.start_time <= 1411465037 
AND event_time.end_time >= 1408873037;

When I run this query with EXPLAIN on this query, I get this:

Nested Loop Left Join  (cost=3.08..15.75 rows=2 width=587)
  ->  Hash Join  (cost=2.93..9.75 rows=2 width=423)
        Hash Cond: (event_time.event_id = event.id)
        ->  Seq Scan on event_time  (cost=0.00..6.69 rows=22 width=24)
              Filter: ((start_time <= 1411465037) AND (start_time >= 1408873037))
        ->  Hash  (cost=2.87..2.87 rows=5 width=399)
              ->  Hash Join  (cost=1.52..2.87 rows=5 width=399)
                    Hash Cond: (event.id = event_tag.event_id)
                    ->  Seq Scan on event  (cost=0.00..1.17 rows=17 width=386)
                    ->  Hash  (cost=1.45..1.45 rows=6 width=13)
                          ->  Seq Scan on event_tag  (cost=0.00..1.45 rows=6 width=13)
                                Filter: (tag_id = 1)
  ->  Index Scan using uq_event_repeat_rule_event_id on event_repeat_rule  (cost=0.15..2.99 rows=1 width=164)
        Index Cond: (event.id = event_id)

I'm getting Seq Scan on almost all tables. Low number of records might be the cause. But I don't want design based on estimation. Is my index on event_time table which is (event_id, start_time, end_time) can satisfy this query?

Mike Sherrill &#39;Cat Recall&#39; · Accepted Answer

I query this schema based on tag_id and start_time.

You query on "tag_id" and "start_time". You want to know whether your query can use an index on {"event_id", "start_time", "end_time"}?

No, it can't use that index. That index doesn't contain "tag_id" at all, and "start_time" isn't the first column. A query that uses "event_id" and "start_time" in a WHERE clause should use that index, though.

The columns "tag_id" and "start_time" are in different tables. There's an existing index on the pair of columns {"event_tag"."event_id", "event_tag"."tag_id". (The UNIQUE constraint on those columns is implemented with a unique index.) But that index isn't usable for queries that refer only to "event_tag"."tag_id".

Same for "start_time". The column "event_id" is first in the index, so that index probably won't be used for queries that don't also refer to "event_time"."event_id".

I'd try adding these two indexes . . .

create index on event_tag (tag_id);
create index on event_time (start_time, end_time);

Then load a million rows of random data, analyze the tables, and look at the query plan again.

I see no compelling reason for "start_time" and "end_time" to be in a separate table. Think about moving those columns into the table "event".

Event titles are not declared unique. That means you might (will) end up with multiple events having the same title.

Using id numbers on every table is kind of an anti-pattern. For example, the column "event_tag"."id" serves no purpose aside from slowing down queries. (It's meaningless; it makes the table wider, so fewer rows fit on a data page; it's redundant, because the other pair of columns is declared unique; etc.)

What is the reason of Seq Scan on a table?

Answers (1)

Related Questions