Reputation: 2337
We have a musicians table containing records with multiple string fields, say:
I want to pass postgres a long string, say:
"It is known that Jimi liked to set light to his guitar and smash up all the drums while on stage."
and i want to get returned the fields that have any matches - preferably in order of the most matches first:
because i need the search to be case insensitive, i'm constructing a query like this...
select * from musicians where lowercase_string like '%'||firstname||'%' or lowercase_string like '%'||lastname||'%' or lowercase_string like '%'||instrument||'%'
and then looping through (in ruby in my case) to capture the result with the most matches.
this is however very slow in the sql stage (1 minute+).
i've tried adding lower-case GIN index using pg_trgm as suggested here - but it's not helping - presumably because the like query is back to front?
Thanks!
Upvotes: 0
Views: 2762
Reputation: 36274
With my testing, it seems that no trigram index could help your query at all. And no other index type could possibly speed up an (I)LIKE / FTS based search.
I should mention that all of the queries below use the trigram indexes, when they are queried "reversed": when the table contains the document (which is indexed), and your parameter is the query. The (I)LIKE variant variant f.ex. 2-3 times faster with it.
These the queries I've tested:
select *
from musicians
where :input_string ilike '%' || firstname || '%'
or :input_string ilike '%' || lastname || '%'
or :input_string ilike '%' || instrument || '%'
At first, FTS seemed a great idea, but my testing shows that even without ranking, it is 60-100 times slower than the (I)LIKE variant. (So even, when you don't have to post-process results with these methods, these are not worth it).
select *
from musicians
where to_tsvector(:input_string) @@ (plainto_tsquery(firstname) || plainto_tsquery(lastname) || plainto_tsquery(lastname))
However, ORDER BY rank
doesn't slow down that much further: it is 70-120 times slower than the (I)LIKE variant.
select *
from musicians
where to_tsvector(:input_string) @@ (plainto_tsquery(firstname) || plainto_tsquery(lastname) || plainto_tsquery(lastname))
order by ts_rank(to_tsvector(:input_string), plainto_tsquery(firstname) || plainto_tsquery(lastname) || plainto_tsquery(lastname))
Then, for a last effort, I tried the (fairly new) "word similarity" operators of the trigram module: <%
and %>
(available from PostgreSQL 9.6).
select *
from musicians
where :input_string %> firstname
or :input_string %> lastname
or :input_string %> instrument
select *
from musicians
where firstname <% :input_string
or lastname <% :input_string
or instrument <% :input_string
These were somewhat faster then FTS: around 50-70 times slower than the (I)LIKE variant.
(Partially working) rextester: it is run against PostgreSQL 9.5, so the 9.6 operators obviously won't run here.
Update: IF full word match is enough for you, you can actually reverse your query, to be able to use indexes. You'll need to "parse" your query (aka. "long string") though:
with long_string(ls) as (
values (:input_string)
),
words(word) as (
select s
from long_string, regexp_split_to_table(ls, '[^[:alnum:]]+') s
where s <> ''
)
select musicians.*
from musicians, words
where firstname ilike word
or lastname ilike word
or instrument ilike word
group by musicians.id
Note: I parsed the query for every complete word. You can have some other logic there, or it can even be parsed in client side.
The default, btree
index shines here, as it is much faster than the trigram index with (I)LIKE (we won't need them anyway, as we are looking for complete word match here):
with long_string(ls) as (
values (:input_string)
),
words(word) as (
select s
from long_string, regexp_split_to_table(lower(ls), '[^[:alnum:]]+') s
where s <> ''
)
select musicians.*
from musicians, words
where lower(firstname) = word
or lower(lastname) = word
or lower(instrument) = word
group by musicians.id
http://rextester.com/PSABJ6745
You could even get the match count with something like
sum((lower(firstname) = word)::int
+ (lower(lastname) = word)::int
+ (lower(instrument) = word)::int)
Upvotes: 4
Reputation: 125574
The ilike
option with match ordering:
with long_string (ls) as (values
('It is known that Jimi liked to set light to his guitar and smash up all the drums while on stage.')
)
select musicians.*, matches
from
musicians
cross join
long_string
cross join lateral
(select
(ls ilike format ('%%%s%%', first_name) and first_name != '')::int +
(ls ilike format ('%%%s%%', last_name) and last_name != '')::int +
(ls ilike format ('%%%s%%', instrument) and instrument != '')::int
as matches
) m
where matches > 0
order by matches desc
;
first_name | last_name | instrument | matches
------------+-----------+------------+---------
Jimi | Hendrix | Guitar | 2
Phil | Collins | Drums | 1
Ringo | Starr | Drums | 1
Upvotes: 2