Reputation: 2667
I am using PostgreSQL 10 and, even though I have some experience writing queries, I don't now that much about DBMSs. I want that my query be more "intelligent" and not just match exact words. I made a humble research, and it seems that to accomplish this we have many features. First, I thought about using RegEx on the query, but it seems not a good idea, because RegEx are fairly limited. I am sure that Postgres has some better feature more suitable for this.
To implement my smart search I came up with an idea like this: The user can forget a letter in each word or have repeated letters that will not cause a mismatch. Also letters like ç and c should match. This idea is not the perfect reproduction of the problem that I have to solve I am only exposing my problem.
For example:
If a user searches for "macs vermelhas". I should return results containing "macas vermelhas", "maças vermelhas", "macs vermelhas", "macs vrmelhas".
The system is in Portuguese and I have pretty much the same search challenges of Spanish language (ê, ü, ã and etc).
Upvotes: 1
Views: 3225
Reputation: 246443
I think that trigram indexes are a perfect fit for you.
CREATE EXTENSION pg_trgm;
Let's have a table like that:
CREATE TABLE words (word text);
INSERT INTO words VALUES ('maças vermelhas');
INSERT INTO words VALUES ('alguna cosa');
CREATE INDEX ON words USING gin (word gin_trgm_ops);
Now you can find words by similarity:
SELECT * FROM words WHERE word % 'macs vermelhas';
word
-----------------
maças vermelhas
(1 row)
The index can be used (I force it in this case):
SET enable_seqscan=off;
EXPLAIN (COSTS OFF)
SELECT * FROM words WHERE word % 'macs vermelhas';
QUERY PLAN
-----------------------------------------------------
Bitmap Heap Scan on words
Recheck Cond: (word % 'macs vermelhas'::text)
-> Bitmap Index Scan on words_word_idx
Index Cond: (word % 'macs vermelhas'::text)
(4 rows)
Upvotes: 2