Reputation: 10875
I am struggling to improve the search performance when someone tries to query for something that is very common. I have a database that contains 5.3 million records with their mailing address, and a common word in a large portion of them is "road", "rd", "st", etc... So when someone searches for this, it takes a very long time.
As you can see below, I try to search for something that is not very common (arrowhead):
pulsar_dev=# EXPLAIN ANALYZE SELECT
property->>'rollNumber',
property->>'municipalAddress',
property->>'municipalityDescription'
FROM
properties_cmv
WHERE
to_tsvector('simple', property->>'municipalAddress') ||
to_tsvector('simple', property->>'municipalityDescription') ||
to_tsvector('simple', property->>'countyDescription') @@ plainto_tsquery('arrowhead')
ORDER BY ts_rank(to_tsvector('simple', property->>'municipalAddress') ||
to_tsvector('simple', property->>'municipalityDescription') ||
to_tsvector('simple', property->>'countyDescription'), plainto_tsquery('arrowhead')) DESC;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=4420.99..4424.11 rows=1248 width=23) (actual time=136.957..137.047 rows=490 loops=1)
Sort Key: (ts_rank(((to_tsvector('simple'::regconfig, (property ->> 'municipalAddress'::text)) || to_tsvector('simple'::regconfig, (property ->> 'municipalityDesc
ription'::text))) || to_tsvector('simple'::regconfig, (property ->> 'countyDescription'::text))), plainto_tsquery('arrowhead'::text)))
Sort Method: quicksort Memory: 93kB
-> Bitmap Heap Scan on properties_cmv (cost=25.69..4356.81 rows=1248 width=23) (actual time=0.350..136.566 rows=490 loops=1)
Recheck Cond: (((to_tsvector('simple'::regconfig, (property ->> 'municipalAddress'::text)) || to_tsvector('simple'::regconfig, (property ->> 'municipalityDe
scription'::text))) || to_tsvector('simple'::regconfig, (property ->> 'countyDescription'::text))) @@ plainto_tsquery('arrowhead'::text))
Heap Blocks: exact=39
-> Bitmap Index Scan on prop_address_idx (cost=0.00..25.38 rows=1248 width=0) (actual time=0.072..0.072 rows=490 loops=1)
Index Cond: (((to_tsvector('simple'::regconfig, (property ->> 'municipalAddress'::text)) || to_tsvector('simple'::regconfig, (property ->> 'municipali
tyDescription'::text))) || to_tsvector('simple'::regconfig, (property ->> 'countyDescription'::text))) @@ plainto_tsquery('arrowhead'::text))
Planning time: 0.213 ms
Execution time: 137.184 ms
(10 rows)
It's pretty fast, but when I search for "road", it is not fast at all:
pulsar_dev=# EXPLAIN ANALYZE SELECT
property->>'rollNumber',
property->>'municipalAddress',
property->>'municipalityDescription'
FROM
properties_cmv
WHERE
to_tsvector('simple', property->>'municipalAddress') ||
to_tsvector('simple', property->>'municipalityDescription') ||
to_tsvector('simple', property->>'countyDescription') @@ plainto_tsquery('road')
ORDER BY ts_rank(to_tsvector('simple', property->>'municipalAddress') ||
to_tsvector('simple', property->>'municipalityDescription') ||
to_tsvector('simple', property->>'countyDescription'), plainto_tsquery('road')) DESC;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=25533.10..25560.73 rows=11051 width=23) (actual time=11065.051..11066.883 rows=10356 loops=1)
Sort Key: (ts_rank(((to_tsvector('simple'::regconfig, (property ->> 'municipalAddress'::text)) || to_tsvector('simple'::regconfig, (property ->> 'municipalityDesc
ription'::text))) || to_tsvector('simple'::regconfig, (property ->> 'countyDescription'::text))), plainto_tsquery('road'::text)))
Sort Method: quicksort Memory: 1841kB
-> Bitmap Heap Scan on properties_cmv (cost=117.67..24790.93 rows=11051 width=23) (actual time=1.911..11052.683 rows=10356 loops=1)
Recheck Cond: (((to_tsvector('simple'::regconfig, (property ->> 'municipalAddress'::text)) || to_tsvector('simple'::regconfig, (property ->> 'municipalityDe
scription'::text))) || to_tsvector('simple'::regconfig, (property ->> 'countyDescription'::text))) @@ plainto_tsquery('road'::text))
Heap Blocks: exact=1408
-> Bitmap Index Scan on prop_address_idx (cost=0.00..114.91 rows=11051 width=0) (actual time=1.432..1.432 rows=10356 loops=1)
Index Cond: (((to_tsvector('simple'::regconfig, (property ->> 'municipalAddress'::text)) || to_tsvector('simple'::regconfig, (property ->> 'municipali
tyDescription'::text))) || to_tsvector('simple'::regconfig, (property ->> 'countyDescription'::text))) @@ plainto_tsquery('road'::text))
Planning time: 0.210 ms
Execution time: 11069.142 ms
(10 rows)
How can I improve the performance of the second query? I also need to rank the results, returning the most relevant results first.
Running a similar test on elasticsearch returns in milliseconds.
Upvotes: 1
Views: 561
Reputation: 10875
I created a new table and saved the concatenated tsvector in a column and index that, it seems to have improved the speed.
Upvotes: 3