actionshrimp
actionshrimp

Reputation: 5229

Postgres ts_rank_cd different result for same tsvector?

I'm testing out a text search query on two separate database servers, both running Postgres 9.4.4.

The row in question has the same data, and I get the same underlying tsvector on both servers:

SELECT
  user_id,
  TO_TSVECTOR('english', REGEXP_REPLACE(first_name, '[^a-zA-Z0-9]', ' ', 'g')) ||
  TO_TSVECTOR('english', REGEXP_REPLACE(last_name, '[^a-zA-Z0-9]', ' ', 'g')) ||
  TO_TSVECTOR('english', REGEXP_REPLACE(username, '[^a-zA-Z0-9]', ' ', 'g'))
FROM users_v1 where user_id = 123;

-- On server A:
-- user_id    | to_tsvector
-- -----------+----------------------
-- 123        | 'georg':1 'hickman':2
-- (1 row)

-- On server B:
-- user_id    | to_tsvector
-- -----------+----------------------
-- 123        | 'georg':1 'hickman':2
-- (1 row)

However I get a different rank when using this vector to run a query:

SELECT username,
 TS_RANK_CD(
   TO_TSVECTOR('english', REGEXP_REPLACE(first_name, '[^a-zA-Z0-9]', ' ', 'g')) ||
   TO_TSVECTOR('english', REGEXP_REPLACE(last_name, '[^a-zA-Z0-9]', ' ', 'g')) ||
   TO_TSVECTOR('english', REGEXP_REPLACE(username, '[^a-zA-Z0-9]', ' ', 'g'))
 , PLAINTO_TSQUERY('george'))
FROM users WHERE user_id = 123;

-- On server A:
-- user_id    | ts_rank_cd
-- -----------+----------------------
-- 123        | 0.2
-- (1 row)

-- On server B:
-- user_id    | ts_rank_cd
-- -----------+----------------------
-- 123        | 0.0
-- (1 row)

Is the vector the only input to the rank function, or are there any server settings/anything else that affect the behaviour of ts_rank_cd? Is all the information stored in the vector displayed in the console output, or is there some hidden difference in it which I'm not seeing? If not what could be causing the discrepancy?

Upvotes: 3

Views: 1090

Answers (1)

actionshrimp
actionshrimp

Reputation: 5229

Thanks to jjanes's comment, I realised that PLAINTO_TSQUERY also accepts an optional argument to specify the text search configuration. Doing

SELECT PLAINTO_TSQUERY('george');

returned george on one system and georg on the other, but doing

SELECT PLAINTO_TSQUERY('english', 'george');

returned the same on both and caused the expected ranking. Doing

SHOW default_text_search_config;

Revealed pg_catalog.english was set as the default on one system and pg_catalog.simple on the other. The discrepancy can then be fixed by either explicitly passing the config in when creating the TSQUERY, or updating the default config so they are the same on both DBs.

Upvotes: 1

Related Questions