glarkou
glarkou

Reputation: 7101

Similarity search for name surname

I have a column name which contains name surname (name space surname) and I would like to search it based on

Upvotes: 1

Views: 424

Answers (2)

sticky bit
sticky bit

Reputation: 37472

To counter the exchanged parts of the name you could use split_part() to split the name in its two parts and compare both of them, something similar to the following:

SELECT *
       FROM person
       WHERE split_part(name, ' ', 1) IN ('<given_name_searched_for>'
                                          '<surname_searched_for>')
              OR split_part(name, ' ', 2) IN ('<given_name_searched_for>'
                                              '<surname_searched_for>');

Or have a look at the other string functions and operators. -- there a variants of split functions using regular expressions, e.g..

Are there names like 'John F. Kennedy', that is, with more than one token? Are there names with more than one contiguous spaces? Bear in mind that these have to be addressed with further means if any. (Such things can get hairy. If possible consider revising your design and use a separate column for the surname.)

For the similarity part: PostgreSQL provides some modules, that might be useful here:

Upvotes: 0

klin
klin

Reputation: 121644

You should read about the pg_trgm extension and its function similarity(). A few examples below.

Example data:

create table my_table(id serial primary key, name text);
insert into my_table (name) values
('John Wilcock'),
('Henry Brown'),
('Jerry Newcombe');

create extension if not exists pg_trgm; -- install the extension

Example 1:

select *, 
    similarity(name, 'john wilcock') as "john wilcock",
    similarity(name, 'wilcock john') as "wilcock john"
from my_table;

 id |      name      | john wilcock | wilcock john 
----+----------------+--------------+--------------
  1 | John Wilcock   |            1 |            1
  2 | Henry Brown    |            0 |            0
  3 | Jerry Newcombe |     0.037037 |     0.037037
(3 rows)

Example 2:

select *, 
    similarity(name, 'henry brwn') as "henry brwn",
    similarity(name, 'brovn henry') as "brovn henry"
from my_table;

 id |      name      | henry brwn | brovn henry 
----+----------------+------------+-------------
  1 | John Wilcock   |          0 |           0
  2 | Henry Brown    |   0.642857 |         0.6
  3 | Jerry Newcombe |       0.04 |   0.0384615
(3 rows)

Example 3:

select *
from my_table
where similarity(name, 'J Newcombe') >= 0.6;

 id |      name      
----+----------------
  3 | Jerry Newcombe
(1 row) 

Upvotes: 2

Related Questions