muffin
muffin

Reputation: 2104

Postgresql : Find the string with the closest substring match

If I have a table test with values like :

id | value
----------------
1  | ABC 1-2-3
2  | AB 1-2-3-4-5
3  | ABC 1
4  | ABC 1-2
5  | ABC

and the input string I'm trying to is ABC 1-2-3-4-5, then the closest substring match (if I could call it that) should be ABC 1-2-3. Row # 2 should not match because it doesn't have the "ABC". I've only been able to search for the string if the input string is shorter than the actual records, but not if it's longer. e.g

select * from test where value ilike 'ABC 1-2%';

but this also does not give me an exact record, but only those starting with ABC 1-2. How do I construct the proper sql statement to solve this?

Upvotes: 2

Views: 3068

Answers (2)

Bohemian
Bohemian

Reputation: 425013

Reverse the comparison:

select * from test
where 'ABC 1-2-3-4-5' ilike value || '%'
order by length(value) desc

The best (ie longest) matches will be returned first.

Upvotes: 3

klin
klin

Reputation: 121604

You may be interested in pg_trgm extension:

create extension if not exists pg_trgm;

Standard similarities for your data are as follows:

select *, similarity(value, 'ABC 1-2-3-4-5')
from test
order by 3 desc;

 id |    value     | similarity 
----+--------------+------------
  2 | AB 1-2-3-4-5 |        0.8
  1 | ABC 1-2-3    |   0.714286
  4 | ABC 1-2      |   0.571429
  3 | ABC 1        |   0.428571
  5 | ABC          |   0.285714
(5 rows)

However you can always add additional criteria in WHERE clause:

select *, similarity(value, 'ABC 1-2-3-4-5')
from test
where value ilike 'abc%'
order by 3 desc;

 id |   value   | similarity 
----+-----------+------------
  1 | ABC 1-2-3 |   0.714286
  4 | ABC 1-2   |   0.571429
  3 | ABC 1     |   0.428571
  5 | ABC       |   0.285714
(4 rows)    

Upvotes: 7

Related Questions