Adam
Adam

Reputation: 3138

tsvector only supports English?

I did the following:

ALTER TABLE blog_entry ADD COLUMN body_tsv tsvector;

CREATE TRIGGER tsvectorupdate BEFORE INSERT OR UPDATE ON blog_entry 
    FOR EACH ROW EXECUTE PROCEDURE tsvector_update_trigger(body_tsv, 'pg_catalog.english', body);

CREATE INDEX blog_entry_tsv ON blog_entry USING gin(body_tsv);

UPDATE blog_entry SET body_tsv=to_tsvector(body);

Now this is working:

SELECT title FROM blog_entry WHERE body_tsv @@ plainto_tsquery('hello world');

But when trying to search for non-English text, it's not working at all (no results).

I am using v9.2.2

Please help.

Upvotes: 3

Views: 4702

Answers (2)

Luiz Motta
Luiz Motta

Reputation: 31

I know it's been a while for this question, but I was searching about changing the FTS language and found an other solution. (and better than download a dictionary)

on Postgres CLI you can use the command to get a List of text search configurations: \dF

  • Check your current configuration:

show default_text_search_config;

  • Change your text search configuration to another language:

set default_text_search_config = 'pg_catalog.[language]';

Upvotes: 0

user330315
user330315

Reputation:

It's been a while since I played with this, but you need to create the ts_vector in the correct language, not the ts_query.

So when you update your table, use:

UPDATE blog_entry SET body_tsv=to_tsvector('german', body);

You can also extend the functionality and use an ispell dictionary to make stemming better to the text search engine (although it still won't be as sophisticated as e.g. Solr)

To do that, download the ISPELL dictionary that is e.g. contained in the OpenOffice German dictionary

The .oxt file is actually a .zip file, so you can simply extract its content.

Then copy the file de_DE_frami.dic to the PostgreSQL "share/tsearch_data" directory while changing the extension to .dict (which is what PostgreSQL expects.

Then copy the file de_DE_frami.aff to the same directory, changing the extension to .affix.

You need to convert both (text) files to UTF-8 in order for them to work with PostgreSQL

Then register that dictionary using:

CREATE TEXT SEARCH CONFIGURATION de_config (copy=german);

CREATE TEXT SEARCH DICTIONARY german_stem (
    TEMPLATE = snowball,
    Language = german
);

CREATE TEXT SEARCH DICTIONARY german_ispell (
    TEMPLATE = ispell,
    dictfile = de_DE_frami,
    afffile = de_de_frami
);

alter text search configuration de_config 
     alter mapping for asciiword WITH german_ispell, german_stem;

Once that is done, you can create your ts_vector using:

UPDATE blog_entry SET body_tsv=to_tsvector('de_config', body);

This is also described in the manual: http://www.postgresql.org/docs/current/static/textsearch-dictionaries.html#TEXTSEARCH-ISPELL-DICTIONARY

Upvotes: 10

Related Questions