Olivier D
Olivier D

Reputation: 453

On AWS RDS Postgres, how to have dictionaries and unaccented full-text search?

I want to use PostgreSQL 9.5.4 on AWS RDS, leveraging the full-text search, dictionaries with stop words, unaccented full-text search.

The context:

With 'unaccent' on, this full-text (Json) query will find 'F(e-acute)vrier' even if I did not enter the accent properly

psql>select * from proto_model.product where to_tsvector((body ->> 'description')) @@ to_tsquery('Fevrier');

With an English dictionary, the same search for 'the', 'any', 'you'... will find nothing as they are 'stop words' defined in the English dictionary and ignored.

The issue:

On my local Postgres, this is not an issue at all. On the managed AWS this is one. EC2+Docker on AWS would not be an issue of course but I focus on RDS Postgres for now.

Locally, the default value of default_text_search_config (get it with psql>show all) is 'pg_catalog.english' which uses the English dictionary and stop words. On RDS this is 'pg_catalog.simple'.

1) In AWS, I cannot add a dictionary or modify one as you need file system access that you don't have. There is no programatical solution to create/update a dictionary AFAIK.

2) In AWS, as the 'postgres' user or even as the 'rds_superuser' that you can create, I cannot alter the global configuration

psql>ALTER SYSTEM SET default_text_search_config = 'pg_catalog.english';
ERROR:  must be superuser to execute ALTER SYSTEM command

In addition, it is absent of the RDS Postgres parameter groups that you can associate to a new Postgres instance and you cannot add missing values! Granting more to 'rds_superuser' (psql>grant all on schema public to ...) does not help.

3) In AWS, as 'postgres' or 'rds_superuser', I can however set for my session, the current text configuration

psql>set default_text_search_config = 'pg_catalog.english'; 
SET

4) Sadly in AWS, as 'postgres' or 'rds_superuser', I cannot alter the search configuration (global) to ignore the accents. This works fine locally.

psql>ALTER TEXT SEARCH CONFIGURATION english ALTER MAPPING FOR hword, hword_part, word WITH unaccent, english_stem;
ERROR:  must be owner of text search configuration english

5) In AWS, as 'postgres' or 'rds_superuser', I can create a new search configuration (English + Unaccent) but I cannot set it as the default even in my session!

psql>CREATE TEXT SEARCH CONFIGURATION english2 (copy=english);
CREATE...
psql>ALTER TEXT SEARCH CONFIGURATION english2 ALTER MAPPING FOR hword, hword_part, word WITH unaccent, english_stem;
ALTER...
psql>set default_text_search_config = 'pg_catalog.english2';
ERROR:  invalid value for parameter "default_text_search_config": "pg_catalog.english2"

So it seems that I am cooked.

The best that I can see could be to associate automatically without a need for individual psql>set default_text_search_config = ... my user connection to a set of configuration options

psql>alter role somerole set default_text_search_config = 'pg_catalog.english';
psql>select * from pg_user; (the option is present by default for all my connections under this role)

Outside of moving out of AWS RDS to EC2+Docker, do you know any solution to (4) or (5) providing me dictionary+unaccent?

Upvotes: 21

Views: 5101

Answers (2)

hellabarger
hellabarger

Reputation: 121

You can change by altering the role parameters in the database like:

ALTER ROLE [role] IN DATABASE [database]
    SET default_text_search_config TO 'pg_catalog.english';

Upvotes: 11

Robin Thomas
Robin Thomas

Reputation: 49

Don't rely on the default_text_search_config.

Instead, create your own text search configuration as you do in #5. Then use the two-argument form of to_tsvector to specify that custom text search configuration instead of the default:

SELECT * from mytable where to_tsvector('myconfig', description) 
    @@ to_tsquery('cat & dog')

An additional benefit of the two-argument version of to_tsvector is that it allows you to use an "expression index" to power text search, rather than a separate tsvector column in your table:

CREATE_INDEX mytable_tsv_idx ON mytable USING GIN
    (to_tsvector('myconfig', description));
-- This query will use the index
SELECT * from mytable WHERE to_tsvector('myconfig', description) 
    @@ to_tsquery('cat & dog');
-- This query, despite setting the default config, 
-- will not use the expression index.
SET default_text_search_config = 'myconfig';
SELECT * from mytable WHERE to_tsvector(description) 
    @@ to_tsquery('cat & dog');

https://www.postgresql.org/docs/9.5/static/textsearch-tables.html#TEXTSEARCH-TABLES-INDEX

Upvotes: 4

Related Questions