David Lio
David Lio

Reputation: 327

How do I optimize this full text search query? (tsvector) (1.5 mil rows)

I currently have a full text search query in PostgreSQL (see below) that scans through a single table of 1.5 million rows to find all the items that match "All" terms as well as "Any" term.

The query executes correctly and at a mediocre speed (~2-3 sec) on queries that have few results. And at terrible speeds on results that are 100,000+ matches (~15-100 sec)

The query first orders results by term type (all terms matched, then any terms matched) and then sub-orders results by a ts_rank_cd calculation of relevance. (As well as more simple variations where it orders by already known columns that can be indexed such as duration)

SELECT
  *,
  ts_rank_cd(textsearchable, query_any_terms) AS relevance,
  textsearchable @@ query_all_terms AS all_terms,
  sum(1) over (PARTITION BY textsearchable @@ query_all_terms) AS match_method_total,
  sum(1) over () AS all_matched_total
FROM
  videos,
  to_tsquery(?) AS query_any_terms,
  to_tsquery(?) AS query_all_terms
WHERE
  website IN (?)
  AND textsearchable @@ query_any_terms
  AND duration_in_seconds >= ?
  AND duration_in_seconds <= ?
ORDER BY
  all_terms DESC, 
  relevance DESC 
LIMIT ? 
OFFSET ?

All relevant columns have been indexed and system monitoring reveals that the memory and cpu aren't being utilized to their fullest and the bottle neck appears to be Disk IO.

The server is an Ubuntu Server 10.04. Memory and cpu power can be easily increased via the backend dynamically as needed to meet the solution.

Currently the database is static when in production and writes to the database will not happen on the production server. Therefore it is possible to completely and accurately generate an index that remains accurate, if that is beneficial.

Any help to finding any avenue of improvement would be greatly appreciated. Additional information can be provided in a timely manner upon request.

Upvotes: 3

Views: 897

Answers (1)

David Lio
David Lio

Reputation: 327

Loaded the DB into ram with tmpfs and the query time improved considerably (i.e. it went from ~100ish seconds to ~2ish seconds).

see: http://www.slideshare.net/pgconf/five-steps-perform2009

Upvotes: 1

Related Questions