Reputation: 122
I have a pl/pgsql script that needs to check if a word/sentence is in a string, and it must take care of word boundaries, and case insenstive.
Example:
So the obvious solution is to use a regex, like this:
select _label ~* (E'\\y' || _pattern || E'\\y') into _match;
It works but is slow, compared to a simple
select _label ilike '%' || _pattern || '%' into _match;
This is wrapped in a function that my script calls A LOT (in the tens of millions, I do a lot of recursion), and with this requirement the overall runtime doubled.
Now my question is, is there a faster way to implement this ?
Thanks.
EDIT: ended up using this:
if _label ilike '%' || _pattern || '%' then
select _label ~* (E'\\m' || _pattern || E'\\M') into _match;
end if;
and it is significantly faster.
Upvotes: 1
Views: 1583
Reputation: 11520
I would consider the full text search capabilities, but from what you're describing, I'd likely implement this using PostgreSQL arrays.
First: define a function that takes a label, lowercases it (or uppercase if you prefer), splits it on word boundaries, and returns an array. Say:
CREATE OR REPLACE FUNCTION label_to_array(text) RETURNS text[] AS $$
SELECT regexp_split_to_array(lower($1), E'\\W');
$$ LANGUAGE sql IMMUTABLE;
$ select label_to_array('my label xx zz yy');
label_to_array
---------------------
{my,label,xx,zz,yy}
Now, create a GIN index over this function:
CREATE INDEX sometable_label_array_key ON sometable
USING GIN((label_to_array(label));
From here, PostgreSQL can use this index for many queries involving array operators, such as "contains":
SELECT *
FROM sometable
WHERE label_to_array(label) @> label_to_array('my label');
This query would split 'my label'
into {my,label}
, and would then use the index to find a list of rows containing my
, intersect that with the list of rows containing label
, and then return the result. This isn't exactly equivalent to your original query (since it doesn't check their order), but since it uses an index to eliminate most of the rows in the table, adding the original check on the end would work just fine:
SELECT *
FROM sometable
WHERE label_to_array(label) <@ label_to_array('my label')
AND label ~* (E'\\y' || 'my label' || E'\\y');
Upvotes: 2