latentflip
latentflip

Reputation: 1883

Performance implication of LIKE query when operating on a subset of full table

I appreciate that LIKE queries are slow as they cannot be indexed. However, I am curious about the performance hit in a situation like this:

Say I have a table like:

user_id  |  message 
-------------------
   1     |  foo bar baz
   1     |  bar buz qux
   .     .      .
   .     .      .
   2     |  bux bar foo
   2     |  bar

where I have say 1 million rows, but 10,000 users, so each user has about 100 messages.

Clearly a search like:

SELECT * FROM table WHERE message like '%ar%';

is going to be very slow. However in my application I would only ever search a user's messages:

SELECT * FROM table WHERE message like '%ar%' AND user_id = 2;

where the user_id column would be indexed.

Am I right in thinking that in a scenario like this, Postgres would only ever perform the slow LIKE query on the users ~100 rows, after using the indexed user_id column, rather than the full table - thus limiting my performance hit?

And also that a query like this wouldn't get significantly slower with 10 or 100 million users, as long as any one user only had ~100 messages?

Upvotes: 4

Views: 3664

Answers (2)

Erwin Brandstetter
Erwin Brandstetter

Reputation: 656251

MatBailie already cleared your primary question. I want to address an assertion of yours:

I appreciate that LIKE queries are slow as they cannot be indexed.

This is not entirely true.

Firstly, and this has been true for a long time, left anchored patterns can use an index. This works for regular expressions (~) as well as LIKE (~~) and SIMILAR TO. I recently wrote a comprehensive review on the matter on dba.SE:

This may not work for you, because the patterns in your question are not anchored. If they were you could get optimized performance with a multicolumn index that uses the text pattern operator class text_pattern_ops for the message column like this:

CREATE INDEX tbl_user_id_message_idx ON tbl (user_id, message text_pattern_ops);

For queries like:

SELECT *
FROM   tbl
WHERE  user_id = 2
AND    message ~~ 'bar%'; -- left anchored LIKE

Secondly, since PostgreSQL 9.1 you can use the pg_trgm extension and create a GIST or GIN index with it, that all patterns can use. Some limitations apply. Maintenance of such an index is more expensive, so it is most useful for read-only or rarely written tables. Details:

Depesz has a related tutorial.

Upvotes: 8

MatBailie
MatBailie

Reputation: 86706

The optimiser determines many things when compiling SQL into a plan.

One of them is how to filter data (with index seeks, etc) before applying other conditions on a row by row basis.


In your case, provided you have a suitable index the LIKE will only be applied to the records after that filtering is done.


To understand a bit more about it, get the plan that is created by your query. You should be able to see where indexes are used to sub-set/filter the data, and then a separate step applying the LIKE condition.

Upvotes: 3

Related Questions