TheAJ
TheAJ

Reputation: 10875

Postgres full-text search performance for common words

I am struggling to improve the search performance when someone tries to query for something that is very common. I have a database that contains 5.3 million records with their mailing address, and a common word in a large portion of them is "road", "rd", "st", etc... So when someone searches for this, it takes a very long time.

As you can see below, I try to search for something that is not very common (arrowhead):

pulsar_dev=# EXPLAIN ANALYZE SELECT
                property->>'rollNumber',
                property->>'municipalAddress',
                property->>'municipalityDescription'
FROM
                properties_cmv
WHERE
                to_tsvector('simple', property->>'municipalAddress') ||
                to_tsvector('simple', property->>'municipalityDescription') ||
                to_tsvector('simple', property->>'countyDescription') @@ plainto_tsquery('arrowhead')
ORDER BY ts_rank(to_tsvector('simple', property->>'municipalAddress') ||
                to_tsvector('simple', property->>'municipalityDescription') ||
                to_tsvector('simple', property->>'countyDescription'), plainto_tsquery('arrowhead')) DESC;
                                                                                                                                                    QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=4420.99..4424.11 rows=1248 width=23) (actual time=136.957..137.047 rows=490 loops=1)
   Sort Key: (ts_rank(((to_tsvector('simple'::regconfig, (property ->> 'municipalAddress'::text)) || to_tsvector('simple'::regconfig, (property ->> 'municipalityDesc
ription'::text))) || to_tsvector('simple'::regconfig, (property ->> 'countyDescription'::text))), plainto_tsquery('arrowhead'::text)))
   Sort Method: quicksort  Memory: 93kB
   ->  Bitmap Heap Scan on properties_cmv  (cost=25.69..4356.81 rows=1248 width=23) (actual time=0.350..136.566 rows=490 loops=1)
         Recheck Cond: (((to_tsvector('simple'::regconfig, (property ->> 'municipalAddress'::text)) || to_tsvector('simple'::regconfig, (property ->> 'municipalityDe
scription'::text))) || to_tsvector('simple'::regconfig, (property ->> 'countyDescription'::text))) @@ plainto_tsquery('arrowhead'::text))
         Heap Blocks: exact=39
         ->  Bitmap Index Scan on prop_address_idx  (cost=0.00..25.38 rows=1248 width=0) (actual time=0.072..0.072 rows=490 loops=1)
               Index Cond: (((to_tsvector('simple'::regconfig, (property ->> 'municipalAddress'::text)) || to_tsvector('simple'::regconfig, (property ->> 'municipali
tyDescription'::text))) || to_tsvector('simple'::regconfig, (property ->> 'countyDescription'::text))) @@ plainto_tsquery('arrowhead'::text))
 Planning time: 0.213 ms
 Execution time: 137.184 ms
(10 rows)

It's pretty fast, but when I search for "road", it is not fast at all:

pulsar_dev=# EXPLAIN ANALYZE SELECT
                property->>'rollNumber',
                property->>'municipalAddress',
                property->>'municipalityDescription'
FROM
                properties_cmv
WHERE
                to_tsvector('simple', property->>'municipalAddress') ||
                to_tsvector('simple', property->>'municipalityDescription') ||
                to_tsvector('simple', property->>'countyDescription') @@ plainto_tsquery('road')
ORDER BY ts_rank(to_tsvector('simple', property->>'municipalAddress') ||
                to_tsvector('simple', property->>'municipalityDescription') ||
                to_tsvector('simple', property->>'countyDescription'), plainto_tsquery('road')) DESC;
                                                                                                                                                  QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=25533.10..25560.73 rows=11051 width=23) (actual time=11065.051..11066.883 rows=10356 loops=1)
   Sort Key: (ts_rank(((to_tsvector('simple'::regconfig, (property ->> 'municipalAddress'::text)) || to_tsvector('simple'::regconfig, (property ->> 'municipalityDesc
ription'::text))) || to_tsvector('simple'::regconfig, (property ->> 'countyDescription'::text))), plainto_tsquery('road'::text)))
   Sort Method: quicksort  Memory: 1841kB
   ->  Bitmap Heap Scan on properties_cmv  (cost=117.67..24790.93 rows=11051 width=23) (actual time=1.911..11052.683 rows=10356 loops=1)
         Recheck Cond: (((to_tsvector('simple'::regconfig, (property ->> 'municipalAddress'::text)) || to_tsvector('simple'::regconfig, (property ->> 'municipalityDe
scription'::text))) || to_tsvector('simple'::regconfig, (property ->> 'countyDescription'::text))) @@ plainto_tsquery('road'::text))
         Heap Blocks: exact=1408
         ->  Bitmap Index Scan on prop_address_idx  (cost=0.00..114.91 rows=11051 width=0) (actual time=1.432..1.432 rows=10356 loops=1)
               Index Cond: (((to_tsvector('simple'::regconfig, (property ->> 'municipalAddress'::text)) || to_tsvector('simple'::regconfig, (property ->> 'municipali
tyDescription'::text))) || to_tsvector('simple'::regconfig, (property ->> 'countyDescription'::text))) @@ plainto_tsquery('road'::text))
 Planning time: 0.210 ms
 Execution time: 11069.142 ms
(10 rows)

How can I improve the performance of the second query? I also need to rank the results, returning the most relevant results first.

Running a similar test on elasticsearch returns in milliseconds.

Upvotes: 1

Views: 561

Answers (1)

TheAJ
TheAJ

Reputation: 10875

I created a new table and saved the concatenated tsvector in a column and index that, it seems to have improved the speed.

Upvotes: 3

Related Questions