Lars Anundskås
Lars Anundskås

Reputation: 1355

Postgresql - full text search index - unexpected query results

I have a table with a bunch of cols I have created a full text index on a table like this:

CREATE INDEX phrasetable_exp_idx ON msc.mytable 
USING gin(to_tsvector('norwegian', coalesce(msc.mytable.col1,'') || ' ' || 
                 coalesce(msc.mytable.col2,'') || ' ' || 
                 coalesce(msc.mytable.col3,'') || ' ' ||
                 coalesce(msc.mytable.col4,'') || ' ' ||
                 coalesce(msc.mytable.col5,'') || ' ' ||
                 coalesce(msc.mytable.col6,'') || ' ' ||
                 coalesce(msc.mytable.col7,'')));

I try some searches and they are lightning fast, however, for one particular search I don't get the expected results. I have a row in my table where both col1 and col2 have the exact value "Importkompetanse Oslo AS" in col3 it has the value "9999". Only the query to_tsquery('9999') returns the row, which shows me that it does have the value "Importkompetanse Oslo AS" in the both col1 and col2, but the first two queries return no matches.

SELECT *
FROM msc.mytable
WHERE to_tsvector('norwegian', coalesce(msc.col1,'') || ' ' || 
                 coalesce(msc.mytable.col2,'') || ' ' || 
                 coalesce(msc.mytable.col3,'') || ' ' ||
                 coalesce(msc.mytable.col4,'') || ' ' ||
                 coalesce(msc.mytable.col5,'') || ' ' ||
                 coalesce(msc.mytable.col6,'') || ' ' ||
                 coalesce(msc.mytable.col7,'')));
@@ --to_tsquery('Importkompetanse&Oslo&AS') -- nada
   plainto_tsquery('Importkompetanse') -- nada
   --to_tsquery('9999') -- OK!

Does anyone have an idea why my searches yields no results?

EDIT:

For some reason, to_tsquery returns something like this: "'9999':9 'importkompetans':1,6" The word importkompetanse seems to be cut off?

However, if I set it to simple instead of norwegian, I get the expected results and everything looks good. Why is that?

Upvotes: 1

Views: 977

Answers (1)

pozs
pozs

Reputation: 36234

You used cross configuration between your tsvector and tsquery values. You should use consistent configuration, like:

select to_tsvector('norwegian', 'Importkompetanse Oslo AS')
       @@ to_tsquery('norwegian', 'Importkompetanse&Oslo&AS');

SQLFiddle

This is why it worked with the 'simple' configuration (that is your default).

Note: you can always debug text search with ts_debug(): f.ex. 'Importkompetanse' has not been cut off, 'importkompetans' is just the appropriate lexeme for this word (in the 'norwegian' configuration).

Off: you use a really long, expression-based index, which will only be used, if you use the exact expression in your queries too. You used it right in your example, but this makes your queries really long, and if you change your index expression some time later, you need to make sure all "uses" updated as well.

You could use a simple (sql) function, to simplify your queries:

create or replace function col_tsvector(mytable)
  returns tsvector
  immutable
  language sql
  as $function$
return to_tsvector('norwegian',
  coalesce($1.col1, '') || ' ' || 
  coalesce($1.col2, '') || ' ' || 
  coalesce($1.col3, '') || ' ' ||
  coalesce($1.col4, '') || ' ' ||
  coalesce($1.col5, '') || ' ' ||
  coalesce($1.col6, '') || ' ' ||
  coalesce($1.col7, ''))
$function$;

With this, you can greatly simplify your index definition & your queries too. (You can even use the attribute notation.)

Upvotes: 1

Related Questions