Full text search in multiple columns with PostgreSQL

Question

I just started using postgreSQL for fuzzy text matching. I have two columns: job_title and company_name.

A typical full text search would concatenate job_title and company_name, then return search text result according to a single ranking.

However, treating text match in two columns equally can be problematic in my case. For example, Search Engineer at Google Co. should not be ranked equally with Google Search at Engineer Co.

I know that I can assign different weights for each column. However, I don't have a reason to weight one more important than other.

How do I match my keywords against each of the columns separately, and return some "match score" on each keywords?

Something like:

Jobs.where("(to_tsvector('english', position) @@ plainto_tsquery(:q)) AND 

(to_tsvector('english',company) @@ plainto_tsquery(:q))", q: "Search Engineer", q: "Google")

Denis de Bernardy · Accepted Answer

As you've pointed out, you can concatenate tsvectors:

# select to_tsvector('job description') ||
         to_tsvector('company as keyword') ||
         to_tsvector('job description as body') as vector;
                          vector                           
-----------------------------------------------------------
 'bodi':9 'compani':3 'descript':2,7 'job':1,6 'keyword':5
(1 row)

And you can also assign them weights:

# select (setweight(to_tsvector('job description'), 'A') ||
         setweight(to_tsvector('company as keyword'), 'B') ||
         setweight(to_tsvector('job description as body'), 'D')) as vector;
                            vector                             
---------------------------------------------------------------
 'bodi':9 'compani':3B 'descript':2A,7 'job':1A,6 'keyword':5B
(1 row)

You can also play around with ts_rank_cd(). In particular, you could change the way the scores are normalized.

http://www.postgresql.org/docs/current/static/textsearch-controls.html

In your case, it seems that you want to combine two separate queries instead. An ugly but possibly adequate solution might look like this:

select sum(rank) as rank, ...
from (
   select ...
   union all
   select ...
   ) as sub
group by ...
order by sum(rank) desc
limit 10

As you can see, it's not very pretty. It's also a boulevard for aggregating potentially large sets of matching rows. Imho, you'll be better off sticking with the built-in tsvector arithmetic and tweaking weights if needed.

Full text search in multiple columns with PostgreSQL

Answers (1)

Related Questions