Reputation: 1787
Suppose you have two tables with substitutions which MUST be kept as they are and another table containing a body of names. How could I get all the possible substitutions?
Substitution Table
--------------------------------------
word subs_list
MOUNTAIN MOUNTAIN, MOUNT, MT, MTN
HOUSE HAUS, HOUSE
VIEW VU, VIEW
Synonyms table
-------------------------------------------------
EDUCATION SCHOOL, UNIVERSITY, COLLEGE, TRAINING
FOOD STORE, FOOD, CAFE
STORE FOOD, STORE, MARKET
REFRIGERATION FOODLOCKER, FREEZE, FRIDGE
names table
------------------------------------------------
MOUNT VU FOOD USA
MOUNTAIN VU STORE CA
Note: I know that it would be desirable to have just one substitution table, but both substution tables must remain because they served to additional purposes than the one explained above, those tables are already in used. In addition, the list of replacements in both tables are just a varchar with a string separated by commas
Considering the previous, the problem is to generate possible names derived by substitution. For instance, the name MOUNT VU FOOD USA
should be decomposed to MOUNTAIN VIEW FOOD USA
and MOUNTAIN VIEW STORE USA
, the same fashion would apply for the second.
I have been able to get the replacements in a wrong order and all together in function, there is a way to get an array as output with the different names generated after replacement? So far I have created this function for replacement:
create or replace function replace_companies_array(i_sentence IN VARCHAR) returns VARCHAR[] AS $p_replaced$
DECLARE
p_replaced VARCHAR[];
subs RECORD;
flag boolean:= True;
cur_s CURSOR(i_sentence VARCHAR)
FOR SELECT w.input, coalesce(x.word, w.input) as word, count(*) OVER (PARTITION BY w.input) as counter
FROM regexp_split_to_table(trim(i_sentence), '\s') as w(input)
LEFT JOIN (
select s.word, trim(s1.token) as token
from subs01 s
cross join unnest(string_to_array(s.subs_list, ',')) s1(token)
union
select sy.word, trim(s2.token) as token
from syns01 sy
cross join unnest(string_to_array(sy.syn_list, ',')) s2(token)
) as x on lower(trim(w.input)) = lower(x.token)
order by counter;
BEGIN
OPEN cur_s(i_sentence);
LOOP
--fetch row into the substitutions
FETCH cur_s INTO subs;
--Exit when no more rows to fetch
EXIT WHEN NOT FOUND;
SELECT REGEXP_REPLACE(i_sentence,'(^|[^a-z0-9])' || subs.input || '($|[^a-z0-9])','\1' || UPPER(subs.word) || '\2','g')
INTO i_sentence;
END LOOP;
p_replaced:=array_append(p_replaced, i_sentence);
RETURN p_replaced;
END;
$p_replaced$ LANGUAGE plpgsql;
Thank you so much for your contributions
Upvotes: 0
Views: 322
Reputation: 311
I didn't manage to get the final result, but I'w quite close to it!
From sentence: MOUNT VU FOOD USA
, I obtain {"MOUNTAIN VIEW MARKET USA","MOUNTAIN VIEW STORE USA","MOUNTAIN VIEW CAFE USA","MOUNTAIN VIEW FOOD USA"}
Here are all my script to recreate the synonyms & substitute tables:
DROP TABLE IF EXISTS subs01;
DROP TABLE IF EXISTS syns01;
CREATE TABLE subs01 (word VARCHAR(20), subs_list VARCHAR(200));
CREATE TABLE syns01 (word VARCHAR(20), syn_list VARCHAR(200));
INSERT INTO subs01 (word, subs_list) VALUES ('MOUNTAIN', 'MOUNTAIN, MOUNT, MT, MTN'),('HOUSE', 'HAUS, HOUSE'),('VIEW', 'VU, VIEW');
INSERT INTO syns01 (word, syn_list) VALUES ('EDUCATION', 'SCHOOL, UNIVERSITY, COLLEGE, TRAINING'),('FOOD', 'STORE, FOOD, CAFE'),('STORE', 'FOOD, STORE, MARKET'),('REFRIGERATION', 'FOODLOCKER, FREEZE, FRIDGE');
I decided to split the job into 2 phases:
Substitute the words:
CREATE OR REPLACE function substitute_words (i_sentence IN VARCHAR) returns VARCHAR AS $p_substituted$
DECLARE
--p_substituted VARCHAR;
subs_cursor CURSOR FOR select su.word, trim(s2.token) as token from subs01 su cross join unnest(string_to_array(su.subs_list, ',')) s2(token);
subs_record record;
BEGIN
OPEN subs_cursor;
LOOP
FETCH subs_cursor INTO subs_record;
EXIT WHEN NOT FOUND;
RAISE NOTICE 'INFO : TOKEN (%) ',subs_record.token ;
IF i_sentence LIKE '%'|| subs_record.token || '%' THEN
RAISE NOTICE '-- FOUND : TOKEN (%) ',subs_record.token ;
SELECT replace (i_sentence, subs_record.token, subs_record.word) INTO i_sentence;
END IF;
END LOOP;
CLOSE subs_cursor;
RETURN i_sentence;
END
$p_substituted$ LANGUAGE plpgsql;
Replace known words by their synomyms:
CREATE OR REPLACE function synonymize_sentence (i_sentence IN VARCHAR) returns TABLE (sentence_result VARCHAR) AS $p_syn$
DECLARE
syn_cursor CURSOR FOR select su.word, trim(s2.token) as token from syns01 su cross join unnest(string_to_array(su.syn_list, ',')) s2(token);
syn_record record;
BEGIN
CREATE TEMPORARY TABLE record_syn (result VARCHAR(200)) ON COMMIT DROP;
INSERT INTO record_syn (result) SELECT i_sentence;
OPEN syn_cursor;
LOOP
FETCH syn_cursor INTO syn_record;
EXIT WHEN NOT FOUND;
RAISE NOTICE 'INFO : WORD (%) ',syn_record.word ;
INSERT INTO record_syn (result) SELECT replace (result, syn_record.word, syn_record.token) FROM record_syn where result LIKE '%'|| syn_record.word || '%';
END LOOP;
CLOSE syn_cursor;
RETURN QUERY SELECT distinct result FROM record_syn;
END;
$p_syn$ LANGUAGE plpgsql;
Then, to generate the result array, I perform this statement:
SELECT ARRAY(SELECT synonymize_sentence (substitute_words ('MOUNT VU FOOD USA')));
Upvotes: 1