Kirill Salykin
Kirill Salykin

Reputation: 711

Concat postgresql tsvectos without position offset

Based on https://www.postgresql.org/docs/13/textsearch-features.html

tsvector || tsvector
The tsvector concatenation operator returns a vector which combines the lexemes and positional information of the two vectors given as arguments. Positions and weight labels are retained during the concatenation. Positions appearing in the right-hand vector are offset by the largest position mentioned in the left-hand vector, so that the result is nearly equivalent to the result of performing to_tsvector on the concatenation of the two original document strings. (The equivalence is not exact, because any stop-words removed from the end of the left-hand argument will not affect the result, whereas they would have affected the positions of the lexemes in the right-hand argument if textual concatenation were used.)

One advantage of using concatenation in the vector form, rather than concatenating text before applying to_tsvector, is that you can use different configurations to parse different sections of the document. Also, because the setweight function marks all lexemes of the given vector the same way, it is necessary to parse the text and do setweight before concatenating if you want to label different parts of the document with different weights.

Thus this query

select 'a:1 b:2'::tsvector || 'a:1 c:2 b:3'::tsvector;

will result in 'a':1,3 'b':2,5 'c':4

Please advice is there a way to merge several tsvectors while preserving original positions (something similar to this):

select concat_with_preserving('a:1 b:2'::tsvector, 'a:1 c:2 b:3'::tsvector);

so it is equal to 'a':1 'b':2,3 'c':2, eg same positions deduplicated and different positions are just merged (w/o offset).

Thanks!

Upvotes: 1

Views: 418

Answers (1)

jjanes
jjanes

Reputation: 44323

Convert then to text, then concatenate them with spaces between, then convert them back.

(a::text || ' ' || b::text)::tsvector

Upvotes: 3

Related Questions