Postgres queries contaning non-alphabetical characters

I'm developing a search for entities by name in Postgres database. Name usually consists of 1-3 words and may contain symbols &, !, (, ), -, etc.

I'm using gin trigram index and queries: WHERE name ILIKE '%something%', ILIKE 'a%' and WHERE name % 'abc' for fuzzy search (if nothing was found by the exact match via ILIKE). The problem is that we need to support search by any characters, not only letters and numbers. Trigram index ignores such characters.

I've tried text_pattern_ops index for this case but without any success: queries such as WHERE name ILIKE '%$%' are extremely slow

So, is there any way to efficiently process such queries? Do I need a full text search for this purpose?

UPD:

Table is like:

id (int) Name (text)
123 Dolce&Gabbana

Queries are like:

SELECT name FROM brand WHERE name ILIKE '%&%' ORDER BY name;

UPD2:

Query plan for

EXPLAIN(analyze, verbose, buffers, settings)
SELECT name FROM brand
WHERE name ILIKE '%$%'
ORDER BY name

enter image description here

Index was created as:

CREATE INDEX brand_trgm_idx ON brand USING gin (name gin_trgm_ops);

Table was created as:

CREATE TABLE brand (
    id              serial PRIMARY KEY,
    name            TEXT,
    collection_id   TEXT,
    created_at      TIMESTAMP DEFAULT now() NOT NULL,
    created_by      TEXT                    NOT NULL
);

Also tried:

CREATE INDEX brand_name_idx ON brand (name text_pattern_ops);

UPD3:

Checked query analyze for the same db, but with ~1M entries:

enter image description here

Upvotes: 1

Views: 96

Answers (1)

jjanes
jjanes

Reputation: 44157

trgm.h does #define KEEPONLYALNUM. If you removed that and recompiled, it would then keep punctuation characters other than spaces. However, '%$%' doesn't have any usable trigrams in it, just like '%a%' doesn't, as it is too short. So the one concrete example you showed us would still not use the index.

It is also rather hazardous to do this, as then upgrading your system could cause your changed binaries to silently be lost. It would be better to fork and rename, but that is a lot of work.

Upvotes: 3

Related Questions