Reputation: 5229
I'm testing out a text search query on two separate database servers, both running Postgres 9.4.4.
The row in question has the same data, and I get the same underlying tsvector
on both servers:
SELECT
user_id,
TO_TSVECTOR('english', REGEXP_REPLACE(first_name, '[^a-zA-Z0-9]', ' ', 'g')) ||
TO_TSVECTOR('english', REGEXP_REPLACE(last_name, '[^a-zA-Z0-9]', ' ', 'g')) ||
TO_TSVECTOR('english', REGEXP_REPLACE(username, '[^a-zA-Z0-9]', ' ', 'g'))
FROM users_v1 where user_id = 123;
-- On server A:
-- user_id | to_tsvector
-- -----------+----------------------
-- 123 | 'georg':1 'hickman':2
-- (1 row)
-- On server B:
-- user_id | to_tsvector
-- -----------+----------------------
-- 123 | 'georg':1 'hickman':2
-- (1 row)
However I get a different rank when using this vector to run a query:
SELECT username,
TS_RANK_CD(
TO_TSVECTOR('english', REGEXP_REPLACE(first_name, '[^a-zA-Z0-9]', ' ', 'g')) ||
TO_TSVECTOR('english', REGEXP_REPLACE(last_name, '[^a-zA-Z0-9]', ' ', 'g')) ||
TO_TSVECTOR('english', REGEXP_REPLACE(username, '[^a-zA-Z0-9]', ' ', 'g'))
, PLAINTO_TSQUERY('george'))
FROM users WHERE user_id = 123;
-- On server A:
-- user_id | ts_rank_cd
-- -----------+----------------------
-- 123 | 0.2
-- (1 row)
-- On server B:
-- user_id | ts_rank_cd
-- -----------+----------------------
-- 123 | 0.0
-- (1 row)
Is the vector the only input to the rank function, or are there any server settings/anything else that affect the behaviour of ts_rank_cd
? Is all the information stored in the vector displayed in the console output, or is there some hidden difference in it which I'm not seeing? If not what could be causing the discrepancy?
Upvotes: 3
Views: 1090
Reputation: 5229
Thanks to jjanes
's comment, I realised that PLAINTO_TSQUERY also accepts an optional argument to specify the text search configuration. Doing
SELECT PLAINTO_TSQUERY('george');
returned george
on one system and georg
on the other, but doing
SELECT PLAINTO_TSQUERY('english', 'george');
returned the same on both and caused the expected ranking. Doing
SHOW default_text_search_config;
Revealed pg_catalog.english
was set as the default on one system and pg_catalog.simple
on the other. The discrepancy can then be fixed by either explicitly passing the config in when creating the TSQUERY, or updating the default config so they are the same on both DBs.
Upvotes: 1