Reputation: 101
I am looking for a block of code in PostgreSQL to replace every possible subset of a string from a list,I have a list of synonyms and want to replace every synonyms in a string example if I have a string named "ABS PVT LTD" and I have synonyms for PVT as PRIVATE and LTD as LIMITED and viceversa then I want my output to be an array of all possible combination i.e.
ABS PVT LTD , ABS PVT LIMITED , ABS PRIVATE LTD , ABS PRIVATE LIMITED
for my current approach I am getting the output but not all possible combiation i.e. ABS PVT LIMITED , ABS PRIVATE LIMITED
I have attached my current code
coalesce(array_agg(regexp_replace(stringtransformation('ABS PVT LTD', true),concat('\y',keyword,'\y'), syno)),ARRAY[]::text[])
from
synonyms s
where
keyword in (
select
unnest(
string_to_array(stringtransformation('ABS PVT LTD', true), ' ')
)
);
These are my synonyms
Upvotes: 1
Views: 59
Reputation: 12484
There might be a simpler way, but I would use a combination of unions, joins, and recursion.
This CTE simulates a synonym table with only one side of each pairing:
with recursive synonyms_unidirectional (keyword, synonym) as (
values ('PVT', 'PRIVATE'),
('LTD', 'LIMITED')
),
Union it to itself to get the reverse synonyms:
synonyms as (
select keyword, synonym from synonyms_unidirectional
union
select synonym, keyword from synonyms_unidirectional
),
This simulates a table to hold your company names:
company (id, company_name) as (
values (1, 'ABS PVT LTD')
),
Split the names into rows, and keep track of how many words are in each company name:
words as (
select c.id, w.*, max(w.n) over (partition by id) as nwords
from company c
cross join lateral
regexp_split_to_table(c.company_name, '\s+') with ordinality as w(word, n)
),
Preserve each original word and union that with a join to the bidirectional synonyms table (CTE):
words_and_synonyms as (
select id, n, word as synonym, nwords
from words
union
select w.id, w.n, s.synonym, nwords
from words w
join synonyms s on s.keyword = w.word
)
Use recursion to rebuild each possible combination of keywords and synonyms:
, rejoin as (
select id, n, array[synonym] as company_name, nwords
from words_and_synonyms
where n = 1
union all
select p.id, c.n, p.company_name||c.synonym, p.nwords
from rejoin p
join words_and_synonyms c
on c.id = p.id
and c.n = p.n + 1
)
Reconstruct the string from the array, and keep only the rows that have the right number of words:
select id, array_to_string(company_name, ' ') as company_name
from rejoin
where n = nwords;
The result:
┌────┬─────────────────────┐
│ id │ company_name │
├────┼─────────────────────┤
│ 1 │ ABS PRIVATE LTD │
│ 1 │ ABS PVT LTD │
│ 1 │ ABS PRIVATE LIMITED │
│ 1 │ ABS PVT LIMITED │
└────┴─────────────────────┘
(4 rows)
db<>fiddle here
Upvotes: 1