rshar
rshar

Reputation: 1477

Match column values and ignoring special characters in PostgreSQL 11.0

I have following table in PostgreSQL 11.0

name                                            identifier  name_matched
anti-angin® formula                             3792673     anti angin
anti-angin® formula                             3792673     anti angin
arava® 20 mg tablets                            D25155      arava 20 mg
nifedipine extended release tablets 60 mg       23766       nifedipine

I using following query to match name and name_matched values.

select * from tbl
where name ilike '%' || name_matched || '%' 

Above query only give row4 as an output. How can I modify the query to fetch all four rows (ignoring special characters and then match the strings)

Any help is highly appreciated

Upvotes: 0

Views: 618

Answers (2)

Abelisto
Abelisto

Reputation: 15614

select * from tbl
where
  array(select x[1] from regexp_matches(name_matched, '([a-zA-Z0-9]+)', 'g') as x) <@
  array(select x[1] from regexp_matches(name, '([a-zA-Z0-9]+)', 'g') as x);

or to make it shorter:

create function regexp_matches_array(astr text, apattern text)
  returns text[]
  language sql
  immutable
  strict
as $func$
  select array_agg(x[1]) from regexp_matches(astr, apattern, 'g') as x
$func$

select * from tbl
where
  regexp_matches_array(name_matched, '([a-zA-Z0-9]+)') <@
  regexp_matches_array(name, '([a-zA-Z0-9]+)');

Note that it does not taking in account the words order.

Upvotes: 1

Mike Organek
Mike Organek

Reputation: 12494

This should work for the value combinations shown in your sample data. Given that you need to work around - characters, there may be other cases you need to handle. If any turn up, comment, and I will see if I can help you work through them.

select *
  from tbl
 where replace(regexp_replace(name, '[^A-Za-z0-9 -]', ''), '-', ' ') ~*
       replace(regexp_replace(name_matched, '[^A-Za-z0-9 -]', ''), '-', ' ')
;

Upvotes: 0

Related Questions