Reputation: 7101
I have a column name
which contains name surname
(name space surname) and I would like to search it based on
name
, surname
but I would like to match cases where people accidentally inserted surname name
in a different order names surnames
by 1-2 characters.Upvotes: 1
Views: 424
Reputation: 37472
To counter the exchanged parts of the name you could use split_part()
to split the name in its two parts and compare both of them, something similar to the following:
SELECT *
FROM person
WHERE split_part(name, ' ', 1) IN ('<given_name_searched_for>'
'<surname_searched_for>')
OR split_part(name, ' ', 2) IN ('<given_name_searched_for>'
'<surname_searched_for>');
Or have a look at the other string functions and operators. -- there a variants of split functions using regular expressions, e.g..
Are there names like 'John F. Kennedy', that is, with more than one token? Are there names with more than one contiguous spaces? Bear in mind that these have to be addressed with further means if any. (Such things can get hairy. If possible consider revising your design and use a separate column for the surname.)
For the similarity part: PostgreSQL provides some modules, that might be useful here:
Upvotes: 0
Reputation: 121644
You should read about the pg_trgm extension and its function similarity()
. A few examples below.
Example data:
create table my_table(id serial primary key, name text);
insert into my_table (name) values
('John Wilcock'),
('Henry Brown'),
('Jerry Newcombe');
create extension if not exists pg_trgm; -- install the extension
Example 1:
select *,
similarity(name, 'john wilcock') as "john wilcock",
similarity(name, 'wilcock john') as "wilcock john"
from my_table;
id | name | john wilcock | wilcock john
----+----------------+--------------+--------------
1 | John Wilcock | 1 | 1
2 | Henry Brown | 0 | 0
3 | Jerry Newcombe | 0.037037 | 0.037037
(3 rows)
Example 2:
select *,
similarity(name, 'henry brwn') as "henry brwn",
similarity(name, 'brovn henry') as "brovn henry"
from my_table;
id | name | henry brwn | brovn henry
----+----------------+------------+-------------
1 | John Wilcock | 0 | 0
2 | Henry Brown | 0.642857 | 0.6
3 | Jerry Newcombe | 0.04 | 0.0384615
(3 rows)
Example 3:
select *
from my_table
where similarity(name, 'J Newcombe') >= 0.6;
id | name
----+----------------
3 | Jerry Newcombe
(1 row)
Upvotes: 2