Reputation: 518
Suppose I have a table named movie
with a field name
, that is a VARCHAR
field.
I want to implement a search bar in my website, so that when a user enters a string I can query movies with that name.
My first approach was quite naive:
select *
from movie
where name like '%user_string%';
The limitations are:
My idea was to create an additional normalized_name
field, that is computed using the name
field with all special characters and accents stripped. Then the query would become:
select *
from movie
where normalized_name like '%user_string%';
For instance: user searches for pokemon
, the database query return a movie with the normalized_name
= pokemon
, which real name is Pokémon
. Obviously the user string would be first normalized too - in order to allow searching by the movie real name too.
Now, is this a valid approach? What is the most widely used one - that may also make the search even better? Is there any literature on the matter?
Upvotes: 2
Views: 1100
Reputation: 246163
Create a trigram index on a stripped version of the column:
Create the necessary extensions and create an immutable unaccent
called f_unaccent
(see here for more information):
CREATE EXTENSION pg_trgm;
CREATE EXTENSION unaccent;
CREATE OR REPLACE FUNCTION public.immutable_unaccent(regdictionary, text)
RETURNS text LANGUAGE c IMMUTABLE PARALLEL SAFE STRICT AS
'$libdir/unaccent', 'unaccent_dict';
CREATE OR REPLACE FUNCTION public.f_unaccent(text)
RETURNS text LANGUAGE sql IMMUTABLE PARALLEL SAFE STRICT AS
$func$
SELECT public.immutable_unaccent(regdictionary 'public.unaccent', $1)
$func$;
Create the trigram index on the column:
CREATE INDEX ON movie USING gin (translate(f_unaccent(name), '''', '') gin_trgm_ops);
Now perform the following query:
SELECT * FROM movie
WHERE translate(f_unaccent(name), '''', '') ILIKE translate(f_unaccent('user_string'), '''');
Upvotes: 2