Kiran
Kiran

Reputation: 481

custom ranking in postgreSQL full text search

The Postgresql documentation mentions custom ranking functions when the in-built ts_rank and ts_rank_cd functions are insufficient. Is there any documentation/examples how to implement one?

from the postgres docs - The built-in ranking functions are only examples. You can write your own ranking functions and/or combine their results with additional factors to fit your specific needs.

Upvotes: 10

Views: 654

Answers (1)

ErMejo
ErMejo

Reputation: 141

The definition of the original ts_rank(tsvector,tsquery) is:

CREATE OR REPLACE FUNCTION pg_catalog.ts_rank(tsvector, tsquery)
 RETURNS real
 LANGUAGE internal
 IMMUTABLE PARALLEL SAFE STRICT
AS $function$ts_rank_tt$function$

So, we can simply define a function taking a tsvector and tsquery as input and returning a real. By convention, the higher the result, the higher the ranking.

Thus, we can easily create our own function (tested with PostgreSQL 12):

CREATE OR REPLACE FUNCTION public.ts_rank_custom(v tsvector, q tsquery)
 RETURNS real
 LANGUAGE plpgsql
 IMMUTABLE PARALLEL SAFE STRICT
AS $function$
        DECLARE 
                subtotal REAL;
                ta text[];
                i bigint;
        BEGIN
                SELECT INTO subtotal ts_rank(v,q);

                ta = tsvector_to_array(v);

                FOR i IN 1 .. array_upper(ta,1)
                LOOP
                 IF length(ta[i])<4 THEN
                  subtotal = subtotal * 1.1;
                 END IF;
                END LOOP;
                RETURN subtotal;
        END;
        $function$;

Our function takes the value from the original ts_rank as a starting value and then increases it by 10% for every token in tsvector shorter than 4 characters (we like short words).

Here is an example result on our test data:

# SELECT ts,ts_rank(ts,query),ts_rank_custom(ts,query)
        FROM bench,to_tsquery('english','dog') as query
        WHERE query @@ ts order by 3 desc;
                                                             ts                                                              |  ts_rank   | ts_rank_custom 
-----------------------------------------------------------------------------------------------------------------------------+------------+----------------
 'bet' 'bunnies' 'dog' 'heaps' 'potluck' 'purus' 'respires' 'shininess' 'viennese'                                           | 0.06079271 |     0.07355918
 'antagonizing' 'are' 'broths' 'discontinuing' 'dog' 'estuary' 'rosins' 'tanzania'                                           | 0.06079271 |     0.07355918
 'borderlines' 'brunei' 'cutthroat' 'dog' 'ego' 'herrick' 'neptunium' 'scoffs'                                               | 0.06079271 |     0.07355918
 'clock' 'decrescendos' 'dog' 'obsolete' 'peg' 'robocalls' 'rote' 'rousing' 'wabash'                                         | 0.06079271 |     0.07355918
 'absenting' 'comfortingly' 'dog' 'laurence' 'mascots' 'monitors' 'properest' 'scratch' 'terrific' 'unsoundest' 'victualled' | 0.06079271 |     0.06687198
 'advocates' 'angiosperm' 'dog' 'embarrassments' 'leafleting' 'sacrileges' 'toneless' 'whets'                                | 0.06079271 |     0.06687198
 'creeping' 'dissembles' 'distaste' 'dog' 'escapee' 'excusable' 'mickey' 'noose' 'onus' 'physiologists' 'taklamakan'         | 0.06079271 |     0.06687198
 'abusive' 'buddhists' 'comedienne' 'dog' 'gust' 'opts' 'pressed'                                                            | 0.06079271 |     0.06687198
 'chungking' 'depositor' 'dog' 'jutted' 'nothings' 'warfare'                                                                 | 0.06079271 |     0.06687198
 'aconite' 'dog' 'domesticity' 'dowdies' 'pupils' 'undistinguished'                                                          | 0.06079271 |     0.06687198
 'contaminant' 'disinters' 'dog' 'embroidery' 'huskiest' 'lender' 'outlines' 'oversize' 'professor' 'tractors'               | 0.06079271 |     0.06687198
 'dog' 'leavens' 'overtly' 'plucked' 'transfigure'                                                                           | 0.06079271 |     0.06687198
(12 rows)

You can see that our ts_rank_custom favors entries with shorter words.

Basically, you can implement any ranking logic you want inside a function with the given signature and use it instead of the standard ts_rank* functions.

Upvotes: 2

Related Questions