Reputation: 154594
The full text search ranking documentation suggests that
You can write your own ranking functions and/or combine their results with additional factors to fit your specific needs.
But I haven't been able to find any examples of how custom ranking functions can be built.
Specifically, I haven't been able to figure out how to extract the list of lexemes in a tsvector which match a given tsquery… something like this:
> SELECT ts_matching_lexemes('cat in the hat'::tsvector, 'cat'::tsquery);
ts_matching_lexems
------------------
'cat':1
So, how can I figure out which lexemes in a tsvector match a given tsquery?
Upvotes: 5
Views: 1431
Reputation: 6352
It looks like the ts_headline
function already does this internally, but it's deep in the c source and outputs a string. You can, however, use it to prepare an input for string parsing the result (this is relatively slow compared with the c functions):
Code:
CREATE OR REPLACE FUNCTION ts_matching_lexemes(tsv tsvector, tsq tsquery)
RETURNS TSVECTOR AS
$$
WITH
proc AS (
SELECT
ts_headline(tsv::TEXT, tsq, 'StartSel = <;>, StopSel = <;>') tsh
)
, parts AS (
SELECT unnest(regexp_split_to_array(tsh, '<;>')) p FROM proc
)
, parts_enum AS (
SELECT p, lead(p, 1) OVER (), row_number() OVER () FROM parts
)
SELECT (string_agg(p || SUBSTRING(split_part(lead, ' ', 1) FROM 2), ' '))::tsvector
FROM parts_enum
WHERE row_number % 2 = 0
$$
LANGUAGE SQL;
e.g.:
select ts_matching_lexemes(to_tsvector('cat in the hat'), to_tsquery('cat'))
union
select ts_matching_lexemes(to_tsvector('cats and bikes in the hat'), to_tsquery('cat & bike'))
ts_matching_lexemes
tsvector
-------------------
'cat':1
'bike':3 'cat':1
notes:
tsvector
to ts_headline
is to reduce redundant workts_headline(text, to_tsquery(...))
and can be sped up by removing the CTEstsvector @@ tsquery
Upvotes: 5