bilaltehseen
bilaltehseen

Reputation: 101

How to replace every possible subset of a string in PostgeSQL

I am looking for a block of code in PostgreSQL to replace every possible subset of a string from a list,I have a list of synonyms and want to replace every synonyms in a string example if I have a string named "ABS PVT LTD" and I have synonyms for PVT as PRIVATE and LTD as LIMITED and viceversa then I want my output to be an array of all possible combination i.e.

ABS PVT LTD , ABS PVT LIMITED , ABS PRIVATE LTD , ABS PRIVATE LIMITED

for my current approach I am getting the output but not all possible combiation i.e. ABS PVT LIMITED , ABS PRIVATE LIMITED

I have attached my current code

    coalesce(array_agg(regexp_replace(stringtransformation('ABS PVT LTD', true),concat('\y',keyword,'\y'), syno)),ARRAY[]::text[])
from
    synonyms s
where
    keyword in (
    select
        unnest(
   string_to_array(stringtransformation('ABS PVT LTD', true), ' ')
)
);

These are my synonyms

enter image description here

Upvotes: 1

Views: 59

Answers (1)

Mike Organek
Mike Organek

Reputation: 12484

There might be a simpler way, but I would use a combination of unions, joins, and recursion.

This CTE simulates a synonym table with only one side of each pairing:

with recursive synonyms_unidirectional (keyword, synonym) as (
  values ('PVT', 'PRIVATE'),
         ('LTD', 'LIMITED')
), 

Union it to itself to get the reverse synonyms:

synonyms as (
  select keyword, synonym from synonyms_unidirectional
  union 
  select synonym, keyword from synonyms_unidirectional
), 

This simulates a table to hold your company names:

company (id, company_name) as (
  values (1, 'ABS PVT LTD')
), 

Split the names into rows, and keep track of how many words are in each company name:

words as (
select c.id, w.*, max(w.n) over (partition by id) as nwords
  from company c
       cross join lateral 
         regexp_split_to_table(c.company_name, '\s+') with ordinality as w(word, n)
),

Preserve each original word and union that with a join to the bidirectional synonyms table (CTE):

words_and_synonyms as (
  select id, n, word as synonym, nwords
    from words
   union 
  select w.id, w.n, s.synonym, nwords
    from words w
         join synonyms s on s.keyword = w.word
)

Use recursion to rebuild each possible combination of keywords and synonyms:

, rejoin as (
  select id, n, array[synonym] as company_name, nwords
    from words_and_synonyms 
   where n = 1
  union all
  select p.id, c.n, p.company_name||c.synonym, p.nwords
    from rejoin p
         join words_and_synonyms c 
           on c.id = p.id
          and c.n = p.n + 1
)

Reconstruct the string from the array, and keep only the rows that have the right number of words:

select id, array_to_string(company_name, ' ') as company_name 
  from rejoin
 where n = nwords;

The result:

┌────┬─────────────────────┐
│ id │    company_name     │
├────┼─────────────────────┤
│  1 │ ABS PRIVATE LTD     │
│  1 │ ABS PVT LTD         │
│  1 │ ABS PRIVATE LIMITED │
│  1 │ ABS PVT LIMITED     │
└────┴─────────────────────┘
(4 rows)

db<>fiddle here

Upvotes: 1

Related Questions