Reputation: 481
The Postgresql documentation mentions custom ranking functions when the in-built ts_rank and ts_rank_cd functions are insufficient. Is there any documentation/examples how to implement one?
from the postgres docs - The built-in ranking functions are only examples. You can write your own ranking functions and/or combine their results with additional factors to fit your specific needs.
Upvotes: 10
Views: 654
Reputation: 141
The definition of the original ts_rank(tsvector,tsquery)
is:
CREATE OR REPLACE FUNCTION pg_catalog.ts_rank(tsvector, tsquery)
RETURNS real
LANGUAGE internal
IMMUTABLE PARALLEL SAFE STRICT
AS $function$ts_rank_tt$function$
So, we can simply define a function taking a tsvector and tsquery as input and returning a real. By convention, the higher the result, the higher the ranking.
Thus, we can easily create our own function (tested with PostgreSQL 12):
CREATE OR REPLACE FUNCTION public.ts_rank_custom(v tsvector, q tsquery)
RETURNS real
LANGUAGE plpgsql
IMMUTABLE PARALLEL SAFE STRICT
AS $function$
DECLARE
subtotal REAL;
ta text[];
i bigint;
BEGIN
SELECT INTO subtotal ts_rank(v,q);
ta = tsvector_to_array(v);
FOR i IN 1 .. array_upper(ta,1)
LOOP
IF length(ta[i])<4 THEN
subtotal = subtotal * 1.1;
END IF;
END LOOP;
RETURN subtotal;
END;
$function$;
Our function takes the value from the original ts_rank
as a starting value and then increases it by 10% for every token in tsvector shorter than 4 characters (we like short words).
Here is an example result on our test data:
# SELECT ts,ts_rank(ts,query),ts_rank_custom(ts,query)
FROM bench,to_tsquery('english','dog') as query
WHERE query @@ ts order by 3 desc;
ts | ts_rank | ts_rank_custom
-----------------------------------------------------------------------------------------------------------------------------+------------+----------------
'bet' 'bunnies' 'dog' 'heaps' 'potluck' 'purus' 'respires' 'shininess' 'viennese' | 0.06079271 | 0.07355918
'antagonizing' 'are' 'broths' 'discontinuing' 'dog' 'estuary' 'rosins' 'tanzania' | 0.06079271 | 0.07355918
'borderlines' 'brunei' 'cutthroat' 'dog' 'ego' 'herrick' 'neptunium' 'scoffs' | 0.06079271 | 0.07355918
'clock' 'decrescendos' 'dog' 'obsolete' 'peg' 'robocalls' 'rote' 'rousing' 'wabash' | 0.06079271 | 0.07355918
'absenting' 'comfortingly' 'dog' 'laurence' 'mascots' 'monitors' 'properest' 'scratch' 'terrific' 'unsoundest' 'victualled' | 0.06079271 | 0.06687198
'advocates' 'angiosperm' 'dog' 'embarrassments' 'leafleting' 'sacrileges' 'toneless' 'whets' | 0.06079271 | 0.06687198
'creeping' 'dissembles' 'distaste' 'dog' 'escapee' 'excusable' 'mickey' 'noose' 'onus' 'physiologists' 'taklamakan' | 0.06079271 | 0.06687198
'abusive' 'buddhists' 'comedienne' 'dog' 'gust' 'opts' 'pressed' | 0.06079271 | 0.06687198
'chungking' 'depositor' 'dog' 'jutted' 'nothings' 'warfare' | 0.06079271 | 0.06687198
'aconite' 'dog' 'domesticity' 'dowdies' 'pupils' 'undistinguished' | 0.06079271 | 0.06687198
'contaminant' 'disinters' 'dog' 'embroidery' 'huskiest' 'lender' 'outlines' 'oversize' 'professor' 'tractors' | 0.06079271 | 0.06687198
'dog' 'leavens' 'overtly' 'plucked' 'transfigure' | 0.06079271 | 0.06687198
(12 rows)
You can see that our ts_rank_custom favors entries with shorter words.
Basically, you can implement any ranking logic you want inside a function with the given signature and use it instead of the standard ts_rank* functions.
Upvotes: 2