james
james

Reputation: 491

Postgres Multiple Column Text Indexing

If I have a table with millions of records where I am performing a wildcard search, should I be indexing on all columns?

Example:

table.account_id = X 
AND (table.col1 LIKE '%abc%' 
     OR table.col2 LIKE %abc% 
     OR LOWER(table.col3) LIKE %abc% 
     OR LOWER(table.col4) LIKE %abc%)

That is returning in about 1-2s depending on what I am searching. However, I need this to be < .5s since it is an autocomplete field.

My question is if I should be creating an index such as:

CREATE INDEX ix_xxxx ON table (col1, col2,lower(col3),lower(col4), owner_firm_id)

or should I be looking at to_tsvector or text_pattern_ops in the index?

I did try something like the below but it didn't get it to < 1s.

CREATE INDEX ix_name ON table
            USING gist (col1 gist_trgm_ops, col2 gist_trgm_ops, LOWER(col3) gist_trgm_ops, col4 gist_trgm_ops, account_id);

Upvotes: 1

Views: 2618

Answers (2)

jjanes
jjanes

Reputation: 44137

Your GiST index is not treating col4 the same way as the query treats it. Also, GiST indexes are sensitive to column order, and the column which can eliminate the most rows with the least work should go first. Presumably this would be account_id:

CREATE INDEX ix_name ON table
   USING gist (account_id, col1 gist_trgm_ops, col2 gist_trgm_ops, LOWER(col3) gist_trgm_ops, lower(col4) gist_trgm_ops);

The same index only in GIN rather than GiST form might be effective, you can try both and see. That one would not be sensitive to the order the columns are specified in the index.

The pg_trgm indexes operators support LIKE and ILIKE equally well. Rather than doing lower on two of the columns, you could just query with ILIKE rather than LIKE on the original columns and build the index on the original columns. This probably won't change the performance much, but the simpler indexes will be more flexible.

But is it really necessary to having the leading '%' in your queries? Do people expect autocomplete to work when they start typing from the middle?

Upvotes: 2

Laurenz Albe
Laurenz Albe

Reputation: 246163

If you perform a substring search, use a trigram index:

CREATE EXTENSION pg_trgm;

CREATE INDEX ON atable USING gin ((col1 || ' ' || col2 ... gin_trgm_ops);

Then query like

WHERE col1 || ' ' || col2 || ... ILIKE '%abcd%'

For decent performance, don't start searching until the string has reached a length of 4 or so.

Upvotes: 3

Related Questions