Reputation: 12718
I want to allow multilingual search support for my app.
Postgresql 9.6 Search Controls says I need tsvector
and tsquery
to properly parse/normalize text. This works fine with roman-based languages, but not non-roman characters.
Considering this search snippet
where to_tsvector(title) @@ to_tsquery('hola')
I am looking for a title with "hola mi amiga", and it is found. However, given:
where to_tsvector(title) @@ to_tsquery('你') //language = Chinese, Code = zh-CN
I am looking for a title with 你好嗎
and it is not found.
What considerations should I take to allow string normalization to work with non roman characters?
Upvotes: 1
Views: 1169
Reputation: 1
Make sure you set the configuration right
default_text_search_config (string) Selects the text search configuration that is used by those variants of the text search functions that do not have an explicit argument specifying the configuration. See Chapter 12 for further information. The built-in default is pg_catalog.simple, but initdb will initialize the configuration file with a setting that corresponds to the chosen lc_ctype locale, if a configuration matching that locale can be identified.
You can see the current value with
SHOW default_text_search_config;
or SELECT get_current_ts_config();
You can change it for the session with SET default_text_search_config = newconfiguration;
Or, you can use ALTER DATABASE <db> SET default_text_search_config = newconfiguration
From Chapter 12. Full Text Search
During installation an appropriate configuration is selected and default_text_search_config is set accordingly in postgresql.conf. If you are using the same text search configuration for the entire cluster you can use the value in postgresql.conf. To use different configurations throughout the cluster but the same configuration within any one database, use ALTER DATABASE ... SET. Otherwise, you can set default_text_search_config in each session.
Each text search function that depends on a configuration has an optional regconfig argument, so that the configuration to use can be specified explicitly. default_text_search_config is used only when this argument is omitted.
You can use \dF
to see the text search configurations you have installed.
So what you want, is something like this
where to_tsvector('newconfig', title) @@ to_tsquery('newconfig', '你')
No idea what language the query is in to answer this question, or what configuration will properly stem that language.
Upvotes: 1