Reputation: 1355
I have a table with a bunch of cols I have created a full text index on a table like this:
CREATE INDEX phrasetable_exp_idx ON msc.mytable
USING gin(to_tsvector('norwegian', coalesce(msc.mytable.col1,'') || ' ' ||
coalesce(msc.mytable.col2,'') || ' ' ||
coalesce(msc.mytable.col3,'') || ' ' ||
coalesce(msc.mytable.col4,'') || ' ' ||
coalesce(msc.mytable.col5,'') || ' ' ||
coalesce(msc.mytable.col6,'') || ' ' ||
coalesce(msc.mytable.col7,'')));
I try some searches and they are lightning fast, however, for one particular search I don't get the expected results. I have a row in my table where both col1 and col2 have the exact value "Importkompetanse Oslo AS" in col3 it has the value "9999". Only the query to_tsquery('9999') returns the row, which shows me that it does have the value "Importkompetanse Oslo AS" in the both col1 and col2, but the first two queries return no matches.
SELECT *
FROM msc.mytable
WHERE to_tsvector('norwegian', coalesce(msc.col1,'') || ' ' ||
coalesce(msc.mytable.col2,'') || ' ' ||
coalesce(msc.mytable.col3,'') || ' ' ||
coalesce(msc.mytable.col4,'') || ' ' ||
coalesce(msc.mytable.col5,'') || ' ' ||
coalesce(msc.mytable.col6,'') || ' ' ||
coalesce(msc.mytable.col7,'')));
@@ --to_tsquery('Importkompetanse&Oslo&AS') -- nada
plainto_tsquery('Importkompetanse') -- nada
--to_tsquery('9999') -- OK!
Does anyone have an idea why my searches yields no results?
EDIT:
For some reason, to_tsquery returns something like this: "'9999':9 'importkompetans':1,6" The word importkompetanse seems to be cut off?
However, if I set it to simple instead of norwegian, I get the expected results and everything looks good. Why is that?
Upvotes: 1
Views: 977
Reputation: 36234
You used cross configuration between your tsvector
and tsquery
values. You should use consistent configuration, like:
select to_tsvector('norwegian', 'Importkompetanse Oslo AS')
@@ to_tsquery('norwegian', 'Importkompetanse&Oslo&AS');
This is why it worked with the 'simple'
configuration (that is your default).
Note: you can always debug text search with ts_debug()
: f.ex. 'Importkompetanse'
has not been cut off, 'importkompetans'
is just the appropriate lexeme for this word (in the 'norwegian'
configuration).
Off: you use a really long, expression-based index, which will only be used, if you use the exact expression in your queries too. You used it right in your example, but this makes your queries really long, and if you change your index expression some time later, you need to make sure all "uses" updated as well.
You could use a simple (sql) function, to simplify your queries:
create or replace function col_tsvector(mytable)
returns tsvector
immutable
language sql
as $function$
return to_tsvector('norwegian',
coalesce($1.col1, '') || ' ' ||
coalesce($1.col2, '') || ' ' ||
coalesce($1.col3, '') || ' ' ||
coalesce($1.col4, '') || ' ' ||
coalesce($1.col5, '') || ' ' ||
coalesce($1.col6, '') || ' ' ||
coalesce($1.col7, ''))
$function$;
With this, you can greatly simplify your index definition & your queries too. (You can even use the attribute notation.)
Upvotes: 1