user3871
user3871

Reputation: 12716

Postgres UPDATE to_tsvector updating all rows to same value

I want to set the to_tsvector language (e.g.: 'French') so it uses the proper dictionary when rendering the FTS vector.

Table messages has a locale_id column, which is on the locales table. Which I then need to join the locales table to the languages table on locale_id to get the actual language name.

This UPDATE is supposed to loop through all rows in messages and set the vector column = to_tsvector(joined language name, message), yet it updated each row to the same value and same language dictionary (e.g.: to_tsvector('french', stringX).

Why is this? Each row has a different message string and a different locale_id (meaning, different language name).

So if I simply disregard the language config for pg_dictionary and do:

  UPDATE messages 
  SET vector = to_tsvector(message);

Table results:

Messages:

message   | locale_id | vector
-----------------------------
Hi there  | 1         | 'hi':1
Is a test | 2         | 'test':3
Le french | 3         | 'french':2 'le':1 --'le' SHOULD BE omitted since it's a stop word in French pg_dictionary

This works fine. Obviously, it's not loading in the correct language dict for each row. However, doing the following yields the same result for each row:

  UPDATE messages 
  SET vector = to_tsvector(messages_languages.language::regconfig, messages_languages.message) 
  FROM (
    select t3.language, t1.message 
    from messages as t1 
    inner join locales as t2 on (t1.locale_id = t2.id) 
    inner join languages as t3 on (t2.language_id = t3.id)
  ) messages_languages;

I've also tried it with a WITH, same results:

 WITH messages_languages as (
    select t3.language, t1.message 
    from messages as t1 
    inner join locales as t2 on (t1.locale_id = t2.id) 
    inner join languages as t3 on (t2.language_id = t3.id)  
  )
  UPDATE messages
  SET vector = to_tsvector(messages_languages.language::regconfig, messages_languages.message) 
  FROM messages_languages;

Table results:

Messages:

message   | locale_id | vector
-----------------------------
Hi there  | 1         | 'french':2
Is a test | 2         | 'french':2
Le french | 3         | 'french':2  --'le' omitted correctly in french pg_dictionary as it's a STOP word

'french', for pg_dictionary_name = 'French', should be the only 'french':2 vector result in this table, yet all rows are the same

Locales:

id        | language_id    
------------------
1         | 4         
2         | 5       
3         | 6  

Languages:

id        | language    
------------------
4         | 'English'         
5         | 'German'       
6         | 'French'     

Upvotes: 0

Views: 1279

Answers (2)

joop
joop

Reputation: 4523

  • you dont need the subquery
  • you don't need to reselect messages, (the target table is already in the range table)
  • you need to associate the source-query to the result-row

UPDATE messages msg
  SET vector = to_tsvector(lang.language::regconfig, msg.message)
  FROM locales as loco
  JOIN languages as lang ON loco.language_id = lang.id
  WHERE msg.locale_id = loco.id
     ;

Upvotes: 1

user3871
user3871

Reputation: 12716

It turns out you must check the ID of the aliased subquery to the rows you're iterating through on the same table in UPDATE:

  UPDATE messages 
  SET vector = to_tsvector(messages_languages.language::regconfig, messages_languages.message) 
  FROM (
    select t1.id, t3.language, t1.message 
    from messages as t1 
    inner join locales as t2 on (t1.locale_id = t2.id) 
    inner join languages as t3 on (t2.language_id = t3.id)
  ) messages_languages
  -- Need to make sure you're referencing the same row in the subquery by comparing IDs
  WHERE messages.id = messages_languages.id;

Upvotes: 0

Related Questions