Chris Cirefice
Chris Cirefice

Reputation: 5805

Postgres to_tsvector VS ::tsvector

I don't really understand the difference between to_tsvector and ::tsvector in Postgres. I read the documentation on to_tsvectorhere, but there doesn't seem to be any documentation for the other, ::tsvector - that's a bit problematic. It is mentioned here, but it says something about normalizing before querying, and normalization is done through to_tsvector.

I created this SQL Fiddle to demonstrate the two; here is the code if you don't want to navigate away:

DDL:

CREATE TABLE text (
  text_id serial PRIMARY KEY,
  source_text text NOT NULL,
  destination_text text NOT NULL
);

and the SQL:

-- Throw some stuff in there
INSERT INTO text (source_text, destination_text) VALUES
('Hello', 'Hello Result'),
('With Comma, Query', 'WithComma, Result');

-- Forced to use punctuation in the query to match what is in the vector
SELECT T.source_text, T.destination_text
FROM   text T
WHERE  LOWER(T.source_text)::tsvector @@ LOWER('Hello')::tsquery;

-- Vector free of punctuation, don't include it in the query
SELECT T.source_text, T.destination_text
FROM   text T
WHERE  to_tsvector(LOWER(T.source_text)) @@ LOWER('Comma')::tsquery;

SELECT ts_debug('english', 'Something without a comma');

SELECT ts_debug('english', 'Something, with a comma');

It seems to me that to_tsvector will take the text, strip it of punctuation and return the vector. On the other hand, ::tsvector seems to include the punctuation in the vector, resulting in a need to use the same punctuation in the query.

What are the practical differences between the two? Is one generally preferred over the other? In what situations is each preferred?

Upvotes: 14

Views: 21695

Answers (1)

guettli
guettli

Reputation: 28005

to_tsvector(text) reads the string and does some normalization (taking language settings into account) on the string.

::tsvector is a cast. It does not do any normalization (and does not care for language settings).

See: http://www.postgresql.org/docs/current/interactive/functions-textsearch.html

Some years ago a wrote an own to_tsvector() in python, since I was not happy with the way this was handled in postgres. This gave me more control.

To insert the data into the column I used the tsvector cast:

'UPDATE myapp_mymodel SET content=%s::tsvector where id=%s', [tsvector, self.id])

Upvotes: 22

Related Questions